Steven Lewis, President
Access Consulting Ltd., Saskatoon SK Canada
When performance measurement fails, bad things happen. Figure skating learned that lesson the hard way. Its performance measurement system has evolved from an arbitrary and often corrupt gong show to an increasingly reliable (though still imperfect) science. That it has made progress despite its rigid and hidebound culture is all the more reason for health care to learn from its travails.
Here is a brief history of figure skating fiascos. First there was the obsession with compulsory figures – the ability to trace patterns in the ice in slow motion. This arcane talent accounted for 60% of the competition score until 1968, dropping to 20% by 1990. Figure skating is entertainment; it would not exist without paying customers. The figure skating people care about takes place at high speed. The ability to trace figures is like learning to play scales on the piano: essential work done in private. Assigning half the total score to compulsory figures is like awarding a literary prize for grammar and syntax, or a research grant on the basis of where the applicant went to school.
It takes about 17 years for RCT-quality evidence to become standard practice in health care. It took figure skating a quarter of a century to dump compulsory figures in favour of a new balanced scorecard: technical merit and artistic impression, each judged on a scale of 6. The new metrics shifted the focus from calligraphy to content.
When you solve one problem, new challenges come into sharper relief. In skating the dilemma surrounded artistic impression. The very term suggests that merit is in the eye of the beholder, an aesthetic response that defies analysis and quantification. Yet for all that, far from producing wildly varying scores – you like Rembrandt, I like Pollock – the new system generated highly correlated technical and artistic scores. Judges could not bring themselves to give great jumpmeisters like Canadian Elvis Stojko a 5.9 for technical merit and 5.2 for clunky artistic impression. This was tacit recognition that audiences came to see triples and quads and throws and lifts, and booed when high artistic impression scores catapulted lesser athletes to the top. Rather than change the system, the judges fudged it.
It might have remained to this day but for the sport’s Harold Shipman moment. One of the marquee events of the 2002 Salt Lake City Olympics was the Canada-Russia battle for gold in the pairs competition. The Russians blinked – a clear stumble on a side-by-side double axel. Otherwise, both pairs skated flawlessly. By 5 judges to 4, the Russians got the gold, to howls of derision. Ironically, the decision was hardly an unambiguous travesty of justice. Despite their error, the Russians’ program was arguably more difficult, and Russian skaters always look like they would be equally at home at the Bolshoi.
But it turned out that the head of the French skating federation had pressured a French judge to place the Russians ahead of the Canadians in pairs, in return for which the Russian judge would tilt the scales towards the French duo in ice dancing. Not even giving the Canadians a matching gold (the Russians kept theirs) could make the stink go away. The debacle forced the International Skating Union to change its scoring system again.
One change was to replace the vague and elastic notion of artistic impression with 5 program component scores that define the overall narrative and coherence of the program. The other was to assign a technical score to every jump, spin, and footwork sequence. In the new scoring system the spread is much greater. Previously, the effective range for the elite skaters was 5.5 to 5.9, a compression ripe for monkey business. By contrast 2011 world champion Patrick Chan scored a record 281, 22 points ahead of his nearest competitor. There is a lot more rank-order validity in a system that uses a wider range of scores.
Again though, progress hatched a new set of problems. The new system is mysterious to the public. It lacks the elegant if corruptible simplicity of the 6.0 system and the face validity of systems with maximum scores, like 4.0 grade point averages or 10 points for gymnastics. Over time, aficionados figured out what the new scores mean. For everyone else the numbers are meaningful only in relation to each other as proxies for rank-ordering.
The root cause of figure skating performance measurement problems is that unlike the high jump or 400 meters, it requires judging. Judging is always subject to error and second –guessing, especially among international sports governing bodies that have long been havens for Nazi sympathizers (longtime International Olympic Committee head Avery Brundage), fascists (his successor, the Falangist Juan Antonio Samaranch), and pompous elites. They rarely respond to public opinion and sell indulgences like medieval popes. Yet even these sleazy bastions of unaccountability have cleaned up their performance measurement systems. Where there is doubt about the acumen or integrity of a judge, the paper trail is much more transparent than it used to be.
Most of health care is about where figure skating was 30 years ago. It overvalues some metrics: Paying extra for testing HbA1c levels or Pap test screening is akin to handing out medals for compulsory figures. In disciplines like ice dancing, there was a clear hierarchy among the teams, a pecking order largely impervious to actual performance. Sounds a lot like the assumption that the high-profile hospital with good food and valet parking is also safe. Declaring the “art of medicine” to be inaccessible to reliable measurement is like legitimizing scoring anarchy for artistic impression.
Vague and subjective impressions, bias, and reliance on reputation are enemies of transparency, improvement, and fairness in skating and in health care. Health care does a reasonable job of scoring the discrete “jumps” – severity-adjusted surgical mortality rates, 30-day readmissions, infection and other complication rates. It has yet to produce good metrics for choreography and transitions, two component scores in skating of obvious relevance to patients with chronic diseases and the frail elderly.
Great figure skating combines sheer athleticism and technical prowess with program coherence and seamless flow. For a time it appeared as though nothing mattered except the jumps, but in a four and a half minute long program, all the jumps combined consume maybe 30 seconds. There is great if subtler genius in the remainder, and rather than accepting arbitrary subjectivity or narrowing its focus to the easy-to-measure indicators, skating has worked hard to define and measure it. Good for them for trying.
In health care we are understandably dazzled by the quad bypass and the transplant program, but we should be just as impressed by beautiful, everyday achievements such the avoidance of polypharmacy, the prevention of falls, the transition of patients to self-management, and the responsiveness of the same-day appointment. Many are hard to measure and cause-and-effect may be elusive. But the stakes are much higher than giving someone the wrong colour Olympic medal. All the more reason to try harder, faster.
 OK, this is an exaggeration – inept or corrupt skating judges don’t kill people, just careers. Shipman was the British doctor who killed 300 of his patients before anyone noticed. His case unmasked the naïve folly of conferring life-long licensure on the basis of entry-to-practice credentials, with no subsequent practice profiling or other form of quality assurance or risk management.
 Ah, but gymnastics abandoned the 10-point system in favour of more nuanced, open-ended scoring that assigns more points to more difficult routines. Perhaps the sport that has most successfully incorporated two performance dimensions is diving. The score for each dive is the degree of difficulty (maximum 4.8) times the execution score (maximum 10). The former is objective, and cannot be altered by judges.
Commenting on this Blog entry will be automatically closed on December 6, 2011.