In 1975, the psychologists Stephen West and T. Jan Brown conducted an investigation into the factors that made people more likely to help a stranger. What made their study unique is that they conducted the experiment twice, using two different methods.

In the first study, they staged a crisis. Sixty men walking on a college campus were stopped by a woman who made the following request:

“Excuse me, I was working with a rat for a laboratory class and it bit me. Rats carry so many germs – I need to get a tetanus shot right away but I don’t have any money with me. So I’m trying to collect $1.75 to pay for the shot.”

In some conditions, the woman held her hand as if it had been bitten; in other conditions, her fist was wrapped in gauze that had been soaked in artificial blood. Sometimes she wore an “attractive pant outfit and was tastefully made up” and sometimes she wore a blonde wig, white face powder and dark lipstick, “all of which were inappropriate for her natural complexion.”

Not surprisingly, men offered the most help when the woman was attractive and in urgent need of help, giving her an average of 43 cents. (Every man stopped to help in this condition.) In contrast, an “unattractive” woman with a bloody bandage received 26.5 cents on average, and only 80 percent of men offered help. The less severe conditions led to even less assistance: the men donated approximately 13.5 cents with two-thirds providing some amount of money.

So far, so obvious: when deciding whether or not to help a stranger, the most important variable is the severity of the situation. We might stop for a head-on collision, but not for a fender bender. If you're asking for money, it’s better to be good-looking.

But the most intriguing part of the paper came when the scientists tried to replicate their field study in a lab. Instead of faking an emergency on the street, the sixty male subjects were read a description of the injury (severe/not severe) and shown a photograph of the woman (attractive/unattractive.) Then, the men were asked how much money they would be willing to give her.

In this “interpersonal simulation,” the men were very generous. Interestingly, they gave the woman the most money in the unattractive/severe condition, offering her an average of $1.20, or four and a half times what their peers offered in real life. The same basic pattern persisted across every situation, with the men giving her far larger sums when she was a hypothetical. The lab subjects also insisted they wouldn’t be swayed by her appearance - they said they'd give more when she was less attractive - even though the field test strongly suggested otherwise. West and Brown conclude their 1975 paper with a warning: “The comparison of the results of the field experiment and the interpersonal simulation raise serious questions concerning the validity of the latter approach as a strategy for investigating human social behavior.”

I first learned about this study from a fascinating critique of modern psychology, published in 2007 by the psychologists Roy Baumeister at Florida State University, Kathleen Vohs at the Carlson School of Management, University of Minnesota and David Funder at the University of California, Riverside. In “Psychology as the Science of Self-Reports and Finger Movements,” Baumeister, et al. hold up the results of the West/Brown study as an example of the unsettling discrepancy between what we think we’ll do and what we actually do. Because it turns out that such discrepancies are a recurring theme in the literature. For instance, Baumeister, et al. note that “affective forecasting studies” – research in which people are asked how they will feel if x happens – “systematically show the inaccuracies of people’s predictions” about their own future emotions. Meanwhile, financial decision-making research reveals that people are “moderately risk averse” when dealing with pretend money, but become far more risk averse when large amounts of real cash are involved. Other experiments show that merely asking people about their preferences can alter their preferences; the act of introspection has a distorting effect. As the psychologist Timothy Wilson famously argued, we are all “strangers to ourselves.”

And yet, despite this surplus of evidence, Baumeister and colleagues document a steady decline in the percentage of studies that actually look at behavior, and not just our predictions of it. Here’s the trendline of research published in the elite Journal of Personality and Social Psychology over the last forty years:

As the psychologists note, this is a troubling situation for a science that is typically described as the study of human behavior. Instead of observing humans in vivo, the vast majority of these papers rely on questionnaires, tests and stimuli flashed on computer screens. Subjects predict their actions rather than act them out. But Baumeister et al. point out that such methodologies leave out a lot of the complexity that make people so interesting. In fact, many of the canonical studies of modern psychology, such as the Milgram study, Stanford Prison experiment and Mischel's Marshmallow task, derive their power from the contradiction between predicted behavior - I wouldn't do that! - and our actual behavior. What's more, the "eclipse" of behavioral studies is inevitably shrinking the range of possible psychological subjects, as much of human nature cannot be easily reduced to a self-report. Here are the scientists, getting frisky:

“Whatever happened to helping, hurting, playing, working, taking, eating, risking, waiting, ﬂirting, gooﬁng off, showing off, giving up, screwing up, compromising, selling, persevering, pleading, tricking, outhustling, sandbagging, refusing, and the rest? Can’t psychology find ways to observe and explain these acts, at least once in a while?”

There are, of course, a number of factors behind this shift away from behavior. Field studies are riskier and more expensive; internal review boards are more likely to object to behavioral experiments, as they might upset subjects; in the 1970s, peer-reviewed journals began explicitly favoring psychology articles with multiple studies and, as Baumeister et al. note, “it is far easier to do many studies by seating groups in front of computers…than to measure behavior over and over.”

Again: there is nothing wrong with self-reports. In their paper, Baumeister, Vohs and Funder repeatedly emphasize the value of non-behavioral research, especially for certain subject areas. However, the shortcomings of this approach have also been clearly established – when we talk about ourselves, we often don’t know what we’re talking about.

Baumeister, et al. don’t sound very optimistic that this experimental trend can be reversed. (They call for an “affirmative action for action,” with journals and funding agencies giving “a little extra preference” to papers and proposals that measure behavior.) In the meantime, perhaps we should all just remember the intrinsic limitations of studies that rely exclusively on self-reports. It’s a limitation to keep in mind when reading the papers themselves and when reading blog posts about such papers.

West, Stephen G., and T. Jan Brown. "Physical attractiveness, the severity of the emergency and helping: A field experiment and interpersonal simulation."Journal of Experimental Social Psychology 11.6 (1975): 531-538.

Baumeister, Roy F., Kathleen D. Vohs, and David C. Funder. "Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior?." Perspectives on Psychological Science 2.4 (2007): 396-403.