March 21, 2009

The Bureaucratic Dream of Quantifying Research Results

After the American Century

I can see the attraction for bureaucrats and politicians of giving a numerical score to every book and article that every academic produces. If one could find a way to do this accurately, then individuals, departments, universities, and whole nations could be ranked, and money handed out to the most productive. It seems so logical and easy. Of course, university researchers will resist, but the effort surely would be worth it.

This fantasy has been pursued in different nations, and for the last year has been a key project of the Danish Ministry of Research. As it happens, I was dragooned (not asked) to serve as one of 300 experts charged with drawing up the lists of all scholarly journals and academic publishers, and then dividing them into groups based on quality. More points would be given to work published in the "best" journals. The Ministry considered this task to be so easy that it provided no release time or extra funding for it, and the work was to be done in just a few months. Each sub-committee would send in its lists and the Ministry would combine them into a complete overview.

This reminds me of a story I heard about the Spanish king (centuries ago) deciding to produce a map of his empire by asking each region to prepare a map of itself, the idea being to combine them all into a map of the realm. Each governor had a map drawn, but of course the scale employed and the methods of representation were by no means the same. The King tried to put the pieces together, but instead of a map he had a misshapen patch-work quilt of no value.

Yet making a map of Spain is easy compared to making a map of academic knowledge production. Land, surveyed according to a single system, can be mapped pretty accurately - even if it is not as easy as it might appear, for one one must take account of the curvature of the earth and of slight deviations in measurement due to equipment that reacts to changing temperatures, etc.

But a numerical system to measure knowledge production? Here are some of the problems. First, some fields are intensive, others extensive. In philosophy, for example, the closely reasoned article is the central form of publication, and even a very fine philosopher may not produce so very many in a decade. In my field of history, articles are more frequent, which makes a certain sense, since its subject matter is extremely extensive, with every nation, organization, and institution providing ample areas for study.

Second, in some fields, books are the most important units, in others, articles. Scientists mostly write articles, often of less than 10 pages. For historians, the most significant unit of production is the book. The typical academic book is 250-400 pages, more than ten times the length of the typical history article. How does one compare the two forms? Some university departments in the United States establish "conversion tables" ranging from five articles equals a book to as many as eight articles equals a book. There is no consensus.

The subcommittee of five persons on which I served developed a list of more than 700 English language journals from Britain, Ireland, the United States, Canada, Australia, and New Zealand. The same committee was also responsible for the Spanish and French journals, most of which I cannot offer a qualified opinion about. Imagine that we used only ten minutes to consider the ranking of each of the 700 English language journals. That is far too little time, yet that would take 7000 minutes, or 116 hours. The problem is that even a committee of five will not know all 700 journals.

Nor was this all. We also had to compile a list of academic publishers, a formidable task in itself, for many universities in the English speaking have presses, including some of the most prestigious. We were provided a list to start with, but it was rather useless, as it omitted many of the finest publishers and was not drawn up according to any principles that I could discern. We were told it was a Norwegian list, but I think the Norwegians are far more clever.

Well, we did our best, as did the other sub-committees, but our map of academic knowledge production could not possibly become coherent. And to make matters worse, unidentified persons in the Ministry (none of them with even a Ph.D. so far as I can tell) tried to adjust the rankings without consulting the specialists involved. They made the mess worse and called their own intelligence into question. Example. A physics article published in Science is considered by any university a great achievement. Unfortunately, the Danish Ministry of Research did not know this and assigned Science a low ranking. That should have been a no-brainer. Readers in Denmark will know that this fiasco became part of an on-going news story about the attempt to create a what is called (in rough translation) a "bibliometric measurement system."

For the record, let me say that from day one I felt this was a misguided enterprise, whose real purpose was to take decision-making about quality out of the hands of professors and give it to bean-counters in the Ministry. Furthermore, such experiments in other nations, notably the UK, have shown that it does not foster world class research. Rather, it encourages a calculated response to whatever point system is established. For example, suddenly several short articles are better than one long one, several articles accepted by mediocre journals are "worth more" than one really great article that took years to write and place in a top journal. A book that can be researched quickly is worthwhile, but scholars are, in effect, punished for attempting anything that takes more than a few years. Textbooks are not worth any points, so no one wants to write them. Book reviews are also worth little or nothing, so this essential and very public part of the peer review system is weakened.

Worst of all, academics may possibly come to believe that every article published in a "top" journal is automatically better than one appearing in a "lesser" journal. In fact, innovative work often finds a home in new journals or new publication series, created by upstarts or dissenters. Judging and rewarding academic research based on a point system reifies the present hierarchy and punishes innovators. The goal may be stimulating research, but the result can be ossification.

It may seem astonishing to bureaucrats, but the best judges of what is great research are the specialists themselves - the peers in peer review. Why judge the content of an idea by the venue where it appears? Why suppose that quantity can make up for quality? Why imagine that knowledge is quantifiable in the first place?