The gender gap and the travesty of assumed linear ordering

There were quite a number of interesting points raised at the Heidelberg Laureate Forum panel debate about the Gender Gap in Science on September 27. But the statement I keep coming back to is the following one by Margo Seltzer (University of British Columbia), one of the panelists: It is a myth that you can take researchers and sort them into a linear order. Choosing between candidates is a multidimensional problem.

Once you’re aware of the myth of linear order, you’ll find examples everywhere: entities in a multidimensional space, forced, more or less arbitrarily, into a linear order. The best universities. The best cities to live in. Commonly with that ultimate weasel word, “best,” suggesting that there is some common linear direction along which the list is ordered. It’s so common, we don’t even object to this travesty anymore.

Linear order is not a given

Gender Gap Panel at 7th HLF: Marie-Francoise Roy, Jessica Carter, Fernando Seabra Chirigati, Anna Vasilchenko, Ragni Piene (standing) and Anna Wienhard (via Skype, not pictured)

Mathematicians should know better. Having a linear order is a special property of sets. For ordinary (integer, rational, real) numbers, no problem. But whenever more than one dimension is involved, things get complicated. That part is easy to see in everyday examples, whenever there is more than one criterion for evaluation. Quantifying any criterion – assigning a number, in a meaningful way – is hard enough for more complicated properties. For assigning such numbers in a way that allows us to compare different criteria, there is no unique, objective solution.

In practice, it is common to just add up the number, and give each criterion a characteristic coefficient: a weight. But even that involves more of an arbitrary choice than merely choosing the coefficients. If we add up those numbers directly, we (tacitly or explicitly) assume that the relation is linear. But it’s quite possible that the number best-suited (if there is such a thing) for defining the generalized property we intend to describe goes with the square root, or with some power, or with a more complicated function of the numbers specifying the separate criteria.

Whenever you define such a number, you should be aware of the choices involved, and of the different choices possible. Who, in such a situation, could be arrogant enough to take such an arbitrary construct, and slap on labels, which, from their meaning in everyday life, bring with it associations of an absolute ordering: better than, worse than, best, second-best, third-best?

How come, for instance, that anyone still gives anything on university rankings, for instance? Why should we even expect some combination of “Research”, “Citations”, “Teaching”, “Industry Income” and “International Outlook,” each difficult to measure on its own, to yield a meaningful number for which it makes sense to attach the label “the best,” and from which to derive a ranking? (Interestingly, this blog post here recently popped up in my Twitter timeline.)

The answer, of course, is: Many people do it that way. And some of the resulting rankings have become quite influential.

Ranking job candidates

Take a situation that is even more difficult: scientists applying for faculty positions. How do you rank the applicants, and decide whom to offer the job? I don’t know what is more error-prone: Emulating the university rankings, that is, coming up with some overall number that suggests objectivity, and going by that – or what appears to be the more common way, with individuals on the selection committee looking through candidates’ applications with an eye towards the given criteria, and making their own choices based on that.

I’ve been in hiring discussions like that, where the different weighting suddenly becomes important. I have had discussions with a colleague who favored one candidate, while I favored another, and quite naturally, the discussion shifted to the different weights given. The colleague was arguing to give one criterion more weight (which favored their preferred candidate), I was arguing for assigning more weight to a criterion that gave my preferred candidate the advantage. There need not have been anything sinister about all of this. I genuinely believed, and still believe, my argument was sound, and I have no reason to assume my colleague did not think the same for their argument. But it does open up the process for the influence of biases, even while all those involved consider themselves to be acting objectively, and with a view towards choosing the best (oops, there is that weasel word again) candidate.


Margo Seltzer at the Gender Gap in Science panel during the 7th HLF.

Some of the biases are conscious – there is nothing subtle about being told that girls cannot do physics anyway, or that the male students in the lecture hall are there to enrich science, while the women are there to make it more beautiful (to quote recent examples from my Twitter timeline). Other biases are unconscious. During the HLF panel debate, Seltzer gave us some homework: Complete two of the implicit bias tests at – something I can only recommend, even though the results tend to be disconcerting.

Studies of artificial situations in which professors were asked to make hiring decisions paint a mixed picture – some of the studies (such as this and this) found clear bias against women, while another one found the opposite. But all such experiments have one definite disadvantage: People may react differently when there’s nothing really at stake for them. Here, on the other hand, is a real-life study from my own field, astronomy: When NASA changed the rules for applications for observing with the Hubble Space Telescope, introducing a double-blind review that forced reviewers to consider the proposal on its direct scientific merits, instead of adding an evaluation of applicants’ previous track records and other factors to the mix, there was a flip: In the 18 previous years, proposals led by men had always had higher acceptance rates than those led by women. With the new double-blind evaluation, female-led proposals did (slightly) better than male-led proposals.

Combine biases with an artificial linear ordering, and you are likely to end up with people who are probably convinced that they have made an objective choice (“the best candidate”). But if we have a sufficient number of people making those choices who are biased against female applicants, that could well be an important part of why we have a gender gap for senior positions in science, which is typically larger than for junior positions or for student numbers. Helped by the travesty of insisting that there is a linear ordering for the multidimensional problem of evaluating applicants.

No silver bullet

Anonymization, as in the case of the Hubble proposal, will not be possible in the case of a faculty search. There’s no silver bullet that will solve the whole problem at once, although there are some tweaks. One step is becoming aware of the biases involved – and of the multi-dimensionality of the problem, and the problems with forcing multi-dimensional entities into a linear order.

Sometimes, one can change the procedure in a way that avoids some of the biases. To this end, Seltzer told the anecdote of a colleague at a major university, who is involved in hiring new faculty. That colleague had developed an interesting strategy: In the search phase, they would call a number of suitable experts and ask them to name what the experts saw as the three top candidates for the position. Almost invariably, the three names that came back would be those of white males. Then, the colleague would mention the need for diversity in the department, and ask for three names of good candidates who would make the department more diverse. The key was in the last question: The colleague would ask his respondents to rank all six names – and usually, the new order would not have the three original names in the first three places.

Sometimes, even a simple thing like that can produce a significant change in the results. Which is appalling, and should be sobering to those who insist that they are merely going by candidates’ merits to fill their positions. But it’s also hopeful that change is possible – if we think about it, act on what we see, and try to curb the influence of biases.


Avatar photo

Markus Pössel hatte bereits während des Physikstudiums an der Universität Hamburg gemerkt: Die Herausforderung, physikalische Themen so aufzuarbeiten und darzustellen, dass sie auch für Nichtphysiker verständlich werden, war für ihn mindestens ebenso interessant wie die eigentliche Forschungsarbeit. Nach seiner Promotion am Max-Planck-Institut für Gravitationsphysik (Albert-Einstein-Institut) in Potsdam blieb er dem Institut als "Outreach scientist" erhalten, war während des Einsteinjahres 2005 an verschiedenen Ausstellungsprojekten beteiligt und schuf das Webportal Einstein Online. Ende 2007 wechselte er für ein Jahr zum World Science Festival in New York. Seit Anfang 2009 ist er wissenschaftlicher Mitarbeiter am Max-Planck-Institut für Astronomie in Heidelberg, wo er das Haus der Astronomie leitet, ein Zentrum für astronomische Öffentlichkeits- und Bildungsarbeit, seit 2010 zudem Leiter der Öffentlichkeitsarbeit am Max-Planck-Institut für Astronomie und seit 2019 Direktor des am Haus der Astronomie ansässigen Office of Astronomy for Education der Internationalen Astronomischen Union. Jenseits seines "Day jobs" ist Pössel als Wissenschaftsautor sowie wissenschaftsjournalistisch unterwegs: hier auf den SciLogs, als Autor/Koautor mehrerer Bücher und vereinzelter Zeitungsartikel (zuletzt FAZ, Tagesspiegel) sowie mit Beiträgen für die Zeitschrift Sterne und Weltraum.

1 comment

  1. The gender gap can of course be achieved by giving preference to women when filling positions. That is what this contribution is ultimately about.
    However, the greater the proportion of women in research, the easier it should be to get women into important professional positions.
    Countries with many female researchers should therefore be countries with many female researchers in deciding positions.
    The majority of researchers in Azerbaijan, Thailand, Kazakhstan, Georgia, Armenia and Kuwait are indeed women. Also in Bolivia, Venezuela, Trinidad & Tobago, Guatemala, Argentina and Panama more than 50% of all researchers are women.
    In Germany, on the other hand, only one third of all researchers are women, and Angela Merkel could well be typical of the career of a German physicist and researcher: she does not pursue the profession for which she was trained.

    In several developing countries there are more women in science and engineering than the European average. There are various explanations for this. One of them goes like this:

    Women in developing countries often choose engineering professions because they can earn (a lot of) money with such a profession and thus support themselves and their families. In the affluent European countries, on the other hand, women often choose occupations independent of career opportunities – simply because they feel attracted by the professional environment. Unfortunately, few European women feel attracted to the scientific/technical environment.

    The article “The STEM Paradox: Why are Muslim-Majority Countries Producing So Many Female Engineers?” ( ) reports on this phenomenon:
    “A large percentage of girls aren’t driven by passion for engineering but by performance,” says Raja Ghozi, a Tunisian engineering professor at the National Engineering School of Tunis who has also studied in the U.S. Though Tunisian women can change their field of study to the humanities, they tend to stick with engineering because it’s something that’s been encouraged by their parents – often their fathers, Ghozi says – and because they know they’re more likely to find jobs in engineering in a country with a 15 percent unemployment rate.

Leave a Reply

E-Mail-Benachrichtigung bei weiteren Kommentaren.
-- Auch möglich: Abo ohne Kommentar. +