What makes a culture unique? What tastes, interests, and concepts define an ethnicity? These are big questions, and here's how we answered them.
We selected 526,000 OkCupid users at random and divided them into groups by their (self-stated) race. We then took all these people's profile essays (280 million words in total!) and isolated the words and phrases that made each racial group's essays statistically distinct from the others'.
For instance, it turns out that all kinds of people list sushi as one of their favorite foods. But Asians are the only group who also list sashimi; it's a racial outlier. Similarly, as we shall see, black people are 20 times more likely than everyone else to mention soul food, whereas no foods are distinct for white people, unless you count diet coke.
Using this kind of analysis, we were able find the interests, hobbies, tastes, and self-descriptions that are specially important to each racial group, as determined by the words of the group itself. The information in this article is not our opinion. It's data, aggregated from the essays of half a million real people.
Click on the icons to toggle between men/women.
In general, I won't comment too much on these lists, because the whole point of this piece is to let the groups speak for themselves, but I have to say that the mind of the white man is the world's greatest sausagefest. Unless you're counting Queens of the Stone Age, there is not even one vaguely feminine thing on his list, and as far as broad categories go we have: sweaty guitar rock, bro-on-bro comedies, things with engines, and dystopias.
As for the interests of a white woman, you have romance novels, some country music, and, weirdly, the Red Sox. It's also amazing the extent to which her list shows a pastoral or rural self-mythology: bonfires, boating, horseback riding, thunderstorms. I remind you that OkCupid's user base is almost all in large cities, where to one degree or another, if you find yourself doing much of any of these things, civilization has come to an end.
If I had to choose over-arching themes for white people's lists, for men, I'd go with "frat house" and for women, "escapism." Whether one begot the other is a question I'll leave to the reader.
Hopefully it's been obvious that the font-size of a phrase indicates the relative frequency with which it appears. So, toggling between black men and black women above, you can see that while soul food is important to both, but it's really, really important to the women. In fact, soul food and black women is the single strongest phrase/group pair we found.
The above lists also make it clear that, regardless of whether Jesus himself was black, his most vocal followers definitely are. Religious expressions weren't among the top phrases for any of the other races, but they're all over the place for black men and (especially) black women, for whom 13 of the top 50 phrases are religious. Black people are over 100% more likely than average to mention their faith in their profiles.
Finally, it's worth nothing that of the four lists we've seen so far, black women's is the only one to explicitly include someone of another race: Justin Timberlake.
Double finally, how bold is it that I am cool is the second most typical phrase for black men?
In the course of researching this article and, in particular, comparing white guys to black guys, a handy shortcut occurred to me:
If you're trying to figure out if white dudes like something, put fucking in the middle, and say it out loud. If it sounds totally badass, white dudes probably love it. Let's see this principle in practice:
Music and dancing-merengue, bachata, reggaeton, salsa-are obviously very important to Latinos of both genders. The men have two other fascinating things going on: an interest in telling you about their sense of humor (i'm a funny guy, very funny, outgoing and funny, etc.) and an interest in industrial strength ass-kicking (mma, ufc, boxing, marines, etc.) Basically, if a Latin dude tells you a joke, you should laugh.
Latinas' interests are fairly typical for a dating site: you got friends, career, education, movies, music, a few physical details, and, oh yeah...morbid fear. We dug further into I'm terrified of (on their list at #42) and found which words typically came next. It's mostly insects and "the dark", though one expert tautologist is "terrified of being scared" and another woman is "terrified of Martians."
I feel obligated to state, on behalf of white men everywhere: That woman should get a grip. Martians are nothing compared to the Sardaukar.
As you can see, both Asian men and women choose I am simple as a prominent self-description. Contrast this to black men's I am cool and Latinos' I'm a funny guy. It's also interesting that Asian men very often mention their specific heritage (taiwan, korea, singapore, vietnam, china) while Asian women don't.
Combing through these lists, you can see the different ways women use cosmetics:
- White women show off their eyes (mascara is #5 on their list).
- Black women show off their lips (lip gloss, #7).
- Latinas show off both (mascara, #18 / lip gloss, #22).
- Asian women show off their practical mindset (lip balm, #48).
. . .
So far, I've gone through racial groups in order of their prominence on OkCupid. For brevity (I know this is the internet), I'll present the remaining lists without foolish commentary. You can click any of the links to reveal them inline.
Stuff Indians like...
Stuff Middle Easterners like...
Stuff Pacific Islanders like...
The first part of our research was to look for differences-we collected phrases that were highly likely to be used by one particular race. But we also wanted to see how similar the races were. By comparing the frequency of each word within the full vocabulary of a racial group to that same word's frequency in the vocabulary of another group, we were able to establish a metric of similarity. We almost felt like real scientists.
Below is a graphical picture of the sameness of the human race, or at least that slice that uses OkCupid. The distance between the groups indicates how alike or dissimilar their interest vocabularies are. Closer is more alike. To help you visualize the situation, we used lines to connect a racial group to its nearest "textual cousins."
Huddled in the lower left, you have what I'd call the "American mainstream": the country's largest ethnic groups sharing roughly similar interests. As you go up and to the right, you move farther from that until, in the opposite corner, you have groups with little in common except their dissimilarity to everyone else. Indians are the furthest outliers, which you'd expect from a group that's almost entirely recent immigrants and their first-generation children. It would be very interesting to see how this plot looks in 50 years.
We also looked at how internally similar each group is; for example, we compared white people to white people, to gauge how alike they all are. Here's what we found.
As you can see, whites and people of mixed-race have the most diverse set of essays-the words two white people use to describe themselves will typically be 70% different. On the other hand, in terms of verbiage, you can expect the essays of two Pacific Islanders to be almost 70% the same.
Since we were parsing all this text anyway, we thought it would be cool to do some basic reading-level analysis on what people had written about themselves. We used the Coleman-Liau Index, and when we partitioned the essays by the race of the writers, we found this:
Before anyone gets too charged-up about this, we also ran reading level by religion and found this:
Is there a Comic Sans version of the Bible? There really should be. We subdivided this chart further, by how serious each person was about their beliefs:
It's interesting to note that for each of the faith-based belief systems I've listed, the people who are the least serious about them write at the highest level. On the other hand, the people who are most serious about not having faith (i.e. the "very serious" agnostics and atheists) score higher than any religious groups.
. . .
See you soon,
OkCupid data scientists Max Shron and Aditya Mukerjee contributed additional research to this post.