Saturday, August 11, 2018

Will THE do something about the citations indicator?


International university rankings can be a bit boring sometimes. It is difficult to get excited about the Shanghai rankings, especially at the upper end: Chicago down two places, Peking up one. There was a bit of excitement in 2014 when there was a switch to a new list of highly cited researchers and some universities went up and down a few places, or even a few dozen, but that seems over with now.

The Times Higher Education (THE) world rankings are always fun to read, especially the citations indicator, which since 2010 has proclaimed a succession of unlikely places as having an outsize influence on the world of research: Alexandria University, Hong Kong Baptist University, Bilkent University, Royal Holloway University of London, National Research University MEPhi Moscow, Tokyo Metropolitan University, Federico Santa Maria Technical University Chile, St George's University of London, Anglia Ruskin University Cambridge, Babol Noshirvani University University of Technology Iran.

I wonder if the good and the great of the academic world ever feel uncomfortable about going to those prestigious THE summits while places like the above are deemed to be the equal for research impact or the superior of Chicago or Melbourne or Tsinghua. Do they even look at the indicator scores?

These remarkable results are not because of deliberate cheating but of THE's methodology. First, research documents are divided into 300 plus fields, five types of documents, and five years of publication, and then the world average number of citations (mean) is calculated for each type of publication in each field and in each year. Altogether there are 8000 "cells" with which the average of each university in the THE rankings is compared .

This means that if a university manages to get a few publications in a field where citations are typically low it could easily get a very high citations score. 

Added to this is a "regional modification" where the final citation impact score is divided by the square root of the score of the country in which the country is located. This results in most universities receiving an increased score which is very small for those in productive countries and very high for those in countries that generate few citations. The modification is now applied to half of the citations indicator score.

Then we have the problems of those famous kilo-author mega-cited papers. These are papers with dozens, scores, or hundreds of participating institutions and similar numbers of authors and citations. Until 2015 THE treated every author as as though they were the sole author of a paper, including those with thousands of authors. Then in 2015 they stopped counting papers with over a thousand authors and in 2016 they introduced a modified fractional counting of citations for papers with over thousand authors. Citations were distributed proportionally among the authors with a minimum allotment of five per cent.

There are problems with all of these procedures. Treating every  authors as as the sole author meant that a few places can get massive citation counts from taking part in one or two projects such as the CERN project or the global burden of disease study . On the other hand excluding mega papers is also not helpful since it omits some of the most significant current research.

The simplest solution would be fractional counting all around, just dividing the number of citations of all papers by the numbers of contributors or contributing institutions. This is the default option of Leiden Ranking and there seems no compelling reason why THE could not so.

There are some other issues that should be dealt with. One is the question of self-citation. This is probably not a widespread issue but it has caused problems on a couple of occasions.

Something else that THE might want to think about is the effect of the rise of in the number of authors with multiple affiliations. So far only one university has recruited large numbers of adjunct staff whose main function seems to be  listing the university as a secondary affiliation at the top of published papers but there could be more in the future. 

Of course, none of this would matter very much if the citations indicator were given a reasonable weighting of, say, five or ten percent but it has more weight than any other indicator -- the next is the research reputation survey with 18 %. A single mega-paper or even a few strategically placed citations in a low cited field can have a huge impact on a university's overallscore.

There are signs that THE is getting embarrassed at the bizarre effects of this indicator. Last year Phil Baty, THE's ranking editor,  spoke about its quirky results. 

Recently, Duncan Ross, data director at THE, has written about the possibility of of a methodological change. He notes that currently the  benchmark world score for the 8000 plus cells  is determined by the mean. He speculates about using the median instead. The problem with this is that a majority of papers are never cited so the median for many of the cells is going to be zero. So he proposes, based on an analysis from the recent THE Latin American rankings, that the 75th percentile be used. 

Ross suggests that this would make the THE rankings more stable, especially the Latin American rankings where the threshold number of articles is quite low. 

It would also allow the inclusion of more universities that currently fall below the threshold. This, I suspect, is something that is likely to appeal to the THE management.

It is very good that THE appears willing to think about reforming the citations indicator. But a bit of tweaking will not be enough. 





No comments: