Describing Applicants in Gendered Language Might Influence Academic Science Hiring

Wendy M. Williams, Stephen J. CeciMay 7, 2015

In 2012, we wrote a feature article for American Scientist titled “ When Scientists Choose Motherhood.” That feature focused on how women academics could have children without being penalized on their way to tenured positions. In our subsequent work, one area we have studied is the hiring practices academics face when attempting to obtain a professorship.

We recently published an article about the results of a 4.5-year program of research on gender’s influence on faculty hiring preferences for tenure-track STEM assistant professorships. (For additional discussion, see this post). This research described five national experiments involving 873 faculty at 371 universities and colleges. This study found a 2:1 faculty preference for hiring women on the STEM tenure track. In four of the five experiments, ratings were based on narrative summaries, authored by a search-committee chair, which reviewed applicants’ credentials, interviews, and job talks. Our methods brought up an interesting issue about the types of adjectives used to describe job applicants, one that we did not have space to address in the paper.

We asked faculty to evaluate three hypothetical applicants for a professorship in their department, informing them that their search committee had rated two of the applicants as “extremely strong” (9.5/10) and the third as slightly weaker (9.3/10), but still “very strong.” Faculty were informed that this rating was established from their search committee’s evaluation of the applicants, based on their CVs and letters of recommendation, hearing their job talks, and meeting with them. In four of five experiments, faculties were not directly given CVs to examine, but rather these narrative summaries.

Because this method differs from how professorial hiring is usually done, some have argued that if we had used CVs and set up actual applicant interviews, we might not have found a preference for hiring women. However, this was not the goal of our experiments. Based on eight published real-world hiring audits—not experiments but actual hiring data about who applies for professorships and who gets hired—it had been documented long before we did our study that actual hiring decisions in the real world show a preference for women. These audit studies, which were cited in our article, have shown that although fewer women apply for tenure-track professorships, those who do apply have a higher chance of being hired than their male competitors. (Of course, of the large number of applicants competing for these positions, the total percentage of applicants who are hired is small, so the vast majority of both women and men applicants are not hired). The goal of our recent experiments was to determine whether women are preferentially hired because the women who apply are stronger than their male competitors (as has been alleged). Thus, we needed to make the man and woman candidates identically strong—impossible to accomplish in an actual hiring context.

CVs are important when comparing applicants in a single, narrow field, but they are problematic when evaluating applicants across multiple fields and different types of institutions—both of which were important considerations because we wanted to generalize our findings to a broad swath of American universities. In the online supplement to our article we discussed why CVs are problematic. Academic fields and institutions differ substantially from one another in what they consider “excellent” in terms of number of publications and type of scholarship. The same applicant’s CV viewed as excellent at a doctoral-intensive institution may be viewed differently at a small teaching-intensive college, and vice versa. Even within a single field there can be large differences in how the same CV is rated, which we discussed in our article (e.g., some subfields within a discipline are more positive about conference proceedings than are other subfields; in our field, some subfields expect 2-3 times more publications than others). Thus, we converted CVs to summaries that did not explicitly state the number or type of publications, but instead used phrases such as: “Based on her vita, letters of recommendation, and their own reading of her work, the search committee rated X’s research record as extremely strong…She was rated 9.5 on a 10-point scale.”

In one experiment, we did give 35 engineering faculty from a single subfield (mechanical) real CVs, and just like in the other four experiments in which we used summaries, they preferred the woman’s CV over the identical CV with a man’s name on it. Indeed, the pro-female preference of these 35 mechanical engineering professors went from 2-to-1 to nearly 3-to-1 when actual CVs were used, which is trending higher than the pro-female preference exhibited by mechanical engineering professors who received summaries rather than CVs. So, yes, our experiments did not mimic the hiring context that departments use to select professors but, as we noted, it was not our intent to find out if faculty preferred women applicants because we knew from the published audit studies that they did. Instead, we set out to test the claim that the preference was due to women applicants being stronger than men. We discovered that this preference extended to situations in which the male and female applicants were equivalently strong—thus, women did not need to be stronger than men to be preferred.

Article-length constraints prevented us from describing in detail some issues not central to our study. We discuss one such issue here, because it is interesting in its own right and because numerous commentators have raised it in an attempt to erroneously dismiss our findings. We refer to the use of gendered personalities to disguise the central purpose of the experiments. We did not want faculty to think, “I know what this study is about—they want to know if I am a sexist when it comes to hiring.” Such awareness could prompt faculty to make politically correct responses not resembling their true feelings. Our strategy was to describe each of the three finalists differently, while holding constant the academic accomplishments of the top two.

So, we systematically portrayed candidates of both sexes in various ways, differing in personality and lifestyle, just as candidates differ in the real world. The lifestyles included single with no children, married with children and stay-at-home spouse, married with children and a spouse who works outside the home, and divorced with children. In every contest between the three finalists, these lifestyles were varied, but, importantly, they were completely counterbalanced across faculty raters so that for every case of a woman depicted in one lifestyle, there was a man depicted in the same lifestyle who was evaluated by a different faculty member. We also varied personality attributes to disguise the purpose of the experiments. The personalities were built around adjectives found in past research to characterize traditionally “female” versus “male” personalities: “imaginative, highly creative, socially skilled, kind, likeable” for women versus “analytical, ambitious, independent, stands up under pressure, powerhouse” for men. (For list of male and female adjectives, including those used in our experiments, see Appendix A in this paper.)

We also counterbalanced these adjective descriptors so that in half of our materials, the female profile was used for a female candidate and in the other half, the female profile was used for a male candidate, and vice versa. By counterbalancing, we ensured that effects of each version of our results were countered by effects of the opposite version. This meant that our results, which described overall average hirability for each gender, were not influenced unfairly by the gender of adjectives used to disguise the purpose of the study. We checked statistically to ensure that this was the case: There were no interactions between faculty-rater gender and adjective gender which, had they existed, would have created effects in different directions for different subgroups and thus complicated the interpretation of the data.

Most people reading our study are not psychologists, so they may not be familiar with the statistical predictions that follow from our “nested” experimental design, which portrayed two main contests: man/male adjectives vs. woman/female adjectives, and man/female adjectives vs. women/male adjectives. (The design was by necessity nested, because it makes no conceptual sense to have a contest between two applicants described with identical personalities.) Our study revealed an overall 2:1 preference for women candidates over men, averaged across experimental conditions. In the real world, candidates may have some traits that are preferred during hiring and others that are not. Our experiment reflected this reality.

Imagine you are hiring a recent high school graduate and there are two people on your shortlist—a graduate from East High and one from West High. East High is a better school that produces generally more skilled graduates; West High is not nearly as good. So, if the candidate from East happens also to be highly motivated and enthusiastic with the personality you seek, and the candidate from West happens to have a personality you do not like (i.e., not particularly motivated or enthusiastic), the contest is between a person with a double dose of what you seek versus a person with a double dose of what you do not want. (Note that being motivated and enthusiastic helps all candidates of both genders in this analogy.) The choice of whom to hire is simple. If, however, the candidate from East happens to have a personality you do not like, and the candidate from West has a personality you really do like, the contest can be difficult to resolve—neither choice is ideal, because each candidate has something you do not want combined with something you do want. It will depend on the weight you attach to each.

The statistical prediction in this case is for a strong preference for candidates from East High with the preferred personality, and a strong preference against candidates from West with the disliked personality. The graduates from East with the disliked personality and those from West with the desired personality each have one positive and one negative attribute influencing the hiring decision; thus, they come out in the middle with regard to hirability.

This situation is exactly what we found in our hiring-preference studies. There were two main effects, each strongly favoring women candidates and female traits. When a candidate had both of these attributes (in the same contest)—a woman with female persona (such as socially-skilled, creative), competing against a man with male persona (such as powerhouse, analytical), the woman with female persona was picked 80.4 percent of the time, and the man only 19.6 percent of the time. This preference for women depicted with female adjectives may be seen by some as surprising in view of the finding that women depicted with female adjectives fare poorly when competing for leadership positions or in traditional male domains (see, for instance, this paper or this one). Clearly more research needs to be done on this issue, as it was not a focus of our experiments; we used gendered adjectives simply as a disguise for the real purpose of the study, which was to examine whether faculty rated women higher than identically qualified men. The use of other adjectives might have resulted in different outcomes.

In the intermediate conditions in the opposite contest, with candidates with one “positive” attribute (female gender) and one “negative” attribute (male personality), competing against a man (“negative” attribute) with female personality (“positive” attribute), women and men applicants were at parity. Overall, women’s average chance of being picked was 67 percent, which ranged in specific matching-lifestyle conditions from 80.4 percent to statistical parity (50.7 percent). Men’s chance of being picked ranged from 19.6 percent to statistical parity (49.3 percent). Men never had an advantage over women in any matched-lifestyle contrast in our experiment. As expected, a candidate with both the female gender and female personality that respondents preferred was overwhelmingly chosen by faculty in our study, whereas female candidates with male personalities that respondents found undesirable were downgraded when they competed against men with female personalities, who were upgraded. Again, the counterbalancing of conditions throughout our studies meant that the reported results averaged across these two hiring contests, as well as across different lifestyles, and thus did not account for the results (For more information, see our website).

In the real world, candidates for tenure-track jobs have both a gender and a personality. Our results reveal that the female gender is a substantial advantage at 67 percent overall. In our study, the male gender coupled with the male personality—at least as captured by the handful of male adjectives that were used—is a huge impediment (preferred only 19.6 percent of the time) when the alternative is a woman with female traits—at least those we used (preferred 80.4 percent of the time). To interpret our results, one must remember that there are two types of contests, described above. The data match the interpretations of two strong and highly preferred female traits—actually being a woman, and being described with adjectives connoting a “female” persona. Note also that in one of our experiments, we asked faculty to rate just one candidate, male or female, depicted always with the female personality. Thus, we held adjective-gender constant, and as we might have expected from our other data, the substantial main effect of female gender resulted in a significant preference for hiring the woman over the man, who was identical in credentials except for the use of pronouns “he” versus “she.” And, in a validation, the same female preference again emerged when both of these candidates were described with male personalities.

Traditional sex roles have give rise to different types of adjectives for describing women and men. The prevailing wisdom is that female profiles limit women's success in STEM fields. But new research adds some complexity to that story.

Readers should remember that gendered personalities (adjectives) were used to disguise the real purpose of the study and were fully counterbalanced and thus had no bearing on our overall results, which were expressed as averages across multiple systematically varied conditions. Our national study was not designed explicitly to evaluate gendered adjectives or gender congruity versus incongruity, which are interesting topics in their own right—we used samples of adjectives representing some of many that might be used in further work.

What do our findings mean in practical terms? Most male PhD-holders on the job market do not plan to switch to the female gender. But, to the extent that new PhDs of either gender can use traditionally “female” adjectives in their job applications, this preliminary work suggests that such traits may confer some advantage. We hesitate to over-interpret this conclusion because it is based on a single study, and it was not the purpose of our study to test adjectives but merely to use a sample of them, shown in past research to have gender connotations, to disguise the real purpose of our experiments. For faculty writing letters of recommendation, the message might be to include female adjectives when describing candidates of either gender. But first, research dedicated to thoroughly exploring gendered adjectives as influences on faculty hiring should be undertaken. In ongoing research, we are continuing to explore the roles of personality in greater detail to provide additional information into the various real-world contexts influencing hiring preferences.

Editor’s Note: American Scientist welcomes all responses that contribute constructively to the conversation about this research.

