Age and Gender Stereotypes
February 9th, 2009 by Lukas BiewaldA while back we built the website FaceStat, where you can upload a picture of yourself and find out what kind of first impression you would make to a stranger on the internet, and also judge others in kind.
To date, we’ve collected more than ten million judgments on over one hundred thousand faces. On a lazy Saturday afternoon, we finally dumped the data and played around with it.
Aggregating millions of these snap decisions tells us a lot about our own biases in surprising ways.
For example, you might think that 20-year-olds would be judged as most attractive. However, in this data babies are most attractive, with another peak around 26. After a dip from 40-50, attractiveness starts to increase again.

We have far more data on people between 18-40 on our website, which explains the tighter error bars.
Women are judged as much more trustworthy than men, with the lowest scores for adolescent males. Interestingly, there is a large jump in trustworthiness for both men and women between 20 and 30, and between 50 and 60:

Children and old people are judged as more intelligent, with males in their twenties getting the lowest scores.

As men and women get older they are thought to be more and more conservative. It’s interesting that young women are perceived as more liberal than young men, but the gap disappears after 25.

A few more details on the above.
- FaceStat has more female than male users. Users both upload faces and also judge others. Judgments were collected over the last eight months. All faces have at least 100 judgments each.
- We grouped faces by perceived age, one bin per year, and plotted one point for the average value of the attribute. The y-axis is normalized: centered on the average face rating, with tick marks for +/- 1 standard deviation across faces.
- Error bars are 95% confidence intervals, though omitted on small sized bins where they would be extremely large. In some sense they are too large (too conservative), since each age year is treated separately. The line is a loess fit.
Finally, here’s a scatterplot matrix of the attributes. Every pair of attributes has two graph panels. The bottom-left panels are smoothed scatterplots that show the density of faces in that attribute pair’s space. The top-right corrgram panels show Pearson correlations: blue means the two attributes are positively correlated, and red means negatively correlated. Unlike the above graphs, genders are not separated, and values are not normalized.
There’s a story in each panel. Looking at the attributes in the middle, we see that conservativeness, wealth, intelligence, and trustworthiness all seem to go together. Intoxication has lots of red panels: it’s anticorrelated with all of them. Age would go along being correlated with all these things, except that extreme youth gets high intelligence and trustworthiness marks. Attractiveness is more complex too: it sometimes goes down at the extremes. Perceived political moderates look more attractive compared to liberals and conservatives; similarly, you’re hot if you look moderately smart or rich, but hideously high wealth and intelligence are a little less attractive.
(On the age-attractiveness scatterplot, note the “old beautiful people” effect seems to weaken compared to the gender breakdown graphs earlier in this post. Simpson’s Paradox?)




February 10th, 2009 at 4:00 am
How are you calculating the confidence interval? They seem awfully large if you have 100,000 faces in total.
February 10th, 2009 at 4:09 am
The data is extremely biased towards people aged 20-30. Also, not every image is labeled with every tag, so each graph plots a subset of the data where both variables were labeled by at least one person.
To answer your question explicitly, we use a two-sided t-test for the error bars.
February 10th, 2009 at 1:36 pm
A two-sided t-test? Do you mean that the error bars are for the difference of the means, not the means themselves? That would be non-standard, but ok. What alpha are you using - ideally you should be using 1.4 (sqrt(1^2 + 1^2)) because then non-overlapping confidence intervals would correspond to a p-value from the t-test of 0.05 or less.
And you must have _really_ small numbers (or distributions that are very heavy tailed) to such wide intervals. Some plots of sample sizes would be informative.
You might also want to plot the differences directly - there is a well known visual problem where our visual system compare distances between curves based on the shortest distance between the them, not the vertical distance.
February 13th, 2009 at 10:39 pm
How about showing how perceived intelligence correlates with race? or with gender?
February 20th, 2009 at 10:14 pm
@Hadley: no, not a two-sample t-test. The bars are 95% CI’s from a one-sample t-test with a two-sided alternative. What you get by calling R’s t.test() on a single vector of numbers. Lots of the extreme buckets have just a few instances, some have only singletons. Giving the counts is probably more useful than all this t-test stuff.
When the CI’s are so big it’s silly to have single-year buckets. This was just a first attempt. We have a better graph since making this one, that uses bigger age buckets at the extremes. With plyr :)
@Bruce: yeah we definitely need to get on that. I figure that binning by race will get people the most riled up as possible…
February 23rd, 2009 at 2:50 pm
Thanks for the explanation - and I’m glad to hear that you’re finding plyr helpful :)
March 25th, 2009 at 11:04 pm
just to echo Hadley’s, since age is a continuous variable and so is your response, you should be using simultaneous error bands. Also which smoother are you using? For correlation coefficients ellipses are better, Deepayan has an example at
http://lmdvr.r-forge.r-project.org/figures/figures.html
look at figure 13.5
Lastly, the I don’t love the density plots based on kernel smoothing, the singletons are not treated very well, and you end up with the blurred dots around the edges of the density.
Plotting the singletons separately sometimes helps.
April 2nd, 2009 at 7:41 pm
Interesting study!
It’d be interesting to understand how the age of the viewer impacts this. I’m guessing that your population is skewed in some way (perhaps overrepresented around that 20-25 age group), and my (untested) hypothesis is that people are better at discerning differences in people closer to their own age. Any chance of incorporating viewer age data?
Drawing envelopes on your error bars might make the data more “fair”; as it is, your solid lines visually overstate their own accuracy, just because they’re so prominent.
April 2nd, 2009 at 8:53 pm
Nicholas — we ended hacking around the problem by making a new graph that uses variable-sized buckets, so larger on the ends. Even with the kind of dumb one-sample independent intervals, the phenomenon for the attractiveness plot emerges. Here’s the progression of graphs:
http://assets.doloreslabs.com/blog/aag_buckets_all.pdf
Yes, smoothed density plots are problematic. We’re using whatever smoother that “smoothScatter” uses; Lukas (not me) looked into the detail there. All kernel smoothers are kinda weird — i’d have a hard time believing there are any good reasons to futz with them beyond changing the bandwidth/span parameter. At least, any reasons that are both (1) principled, and (2) whose assumptions apply to real-world data analysis scenarios.
I like the ellipses.
Alex: Ooh, interesting hypothesis about age of viewers. That’s something to look in to…
April 13th, 2009 at 7:54 pm
Hi,
Variable sized buckets is just like varying the bandwidth. The problem with the binning procedure is that since the weights are equal for every point, as you increase the bucket size, the standard error will decrease. If you want to show the variability inherent in the data, the raw data potted with alpha blending would show that, if you want to show the variability in the means, use simultaneous intervals for the smoother. If all you care about is the population trend than, the bin level intervals don’t matter.
Futzing with the kernel density estimates is not a big deal, it is just that on the edges they give an erroneous picture. One of these days I will send Deepayan a patch with some more sensible options. It is a well known phenomenon that kernel smoothers have really bad problems at the edges of the observed data.
Glad you liked the ellipses.
December 31st, 2009 at 6:20 am
It seems like it is all pretty much downhill from your 30’s, as far as the attractiveness stakes go! I do find it quite odd that we then get prettier again once old age has truly set in, however. It is amazing what stats like these tells us about the mindset and headspace of people. Elderly people tend to be quite romantically inclined, especially when their spouses have departed, with lots of marriages and partnerships forming, and this actually gets reflected quite well in the graph.
January 21st, 2010 at 2:53 pm
wooottt!!!