This is a guest post by Orin Hargraves, an independent lexicographer, language researcher, and past president of the Dictionary Society of North America. Orin is the author of several language reference books, including It’s Been Said Before: A Guide to the Use and Abuse of Clichés (Oxford) and Slang Rules!: A Practical Guide for English Learners (Merriam-Webster).
*
A few years ago I wrote about how collocations in fiction skew the statistics of collocations in a corpus because of their extremely frequent use; Ben Zimmer expanded on the idea in a later New York Times piece. In summary, the point is that a number of collocations would not be statistically significant were it not for their appearance in fiction. This is because writers of fiction—particularly writers of the amateur, unedited fiction that appears online—tend to reuse the same tropes and phrases so much that these effectively become clichés, formulaic ways of expressing the same (rather tired) ideas and events.
All of that came to light when I was working with the Oxford English Corpus, a well balanced and carefully curated corpus that, at the time, had about two billion words of English. These days I’m working with the enTenTen13 corpus, a web-crawled corpus of nearly 20 billion words, owned and made available by Sketch Engine. Sketch Engine’s web-crawler roves the Internet indiscriminately, pulling text from wherever it can be found. Like some grandmother aghast in Greenville, the web-crawler regularly comes upon sites with pornographic content. The difference between the grandmother and the web-crawler is that while she may avert her gaze in shock and dismay, the web-crawler grabs the text, parses and tags it, and adds it to the corpus. The result is that enTenTen13 houses a steaming, pulsating trove of pornographic writing.
As with collocations in fiction generally, the ones recurring in porn narratives skew the statistical digest of numerous high-frequency words in a way that is immediately obvious to any lexicographer working with the enTenTen13 corpus. That is to say, various words for genitalia appear with significant frequency in collocation with a large number of less blushworthy words. You can’t look at a sample of sentences containing the verb fill in proximity to the noun hole without getting an extremely blue eyeful. But a lexicographer on the clock, working on a family-friendly dictionary, is not at liberty to exemplify this byway of language. Strong Language provides a more suitable outlet for an examination of these patterns that are a telling index of subjects that obsess the hearts of men.
I say “men” because one thing is obvious from the get-go: online porn stories are written largely by men, for the titillation of men—men of every persuasion, as the findings below will illustrate. Online pornographic narrative is essentially a subgenre of fantasy, wherein content providers (or should we call them—writers?) give us a glimpse into the features of the sex act that we can assume appeal most strongly to some men’s desires. What the corpus tells us is largely in line with the findings in a paper by Alon Lischinsky et al, published earlier this year, “Doing the naughty or having it done to you? Agent roles in erotic writing,” which Lischinsky talked about in his recent Strong Language post. The authors of that study found (unsurprisingly) that in online pornographic writing the narratives “tend to represent sexual intercourse as an asymmetric engagement between an agent and a patient, rather than as a joint collaborative activity.” In other words, someone’s doing it (usually a man), and someone is getting done (usually a woman). But what are the tools of this trade? What penetrative insight does the corpus provide? The meat of the matter is laid out below.
First, the male member. Cock is the preferred term, far outnumbering erection, dick, prick, shaft, and rod, though each of these shows up in frequencies of appreciable size, as does the rather clinical penis and occasionally the rather affectionate schlong. What are the organ’s characteristics? Well, hard, and often rock-hard—that’s the sine qua non of maledom in the fantasy sex world. Other manly modifiers, though quite sparse in semantic range, do not flag in robustness: thick, stiff, erect, fat, monster, meaty, huge, swollen. As if this weren’t enough to suggest the glory of manhood, the dynamism of verbs in the vicinity of cock and its synonyms suggest an atmosphere in which there is no respite from exertion (not that internet dicks would ever need any). Verbs before and after cock (it seems to enjoy nearly equal role-playing as agent and patient) include stroke, shove, jerk, slide, drill, rub, ram, pound, thrust, and bury.

With so much bristling energy, the pulsating, throbbing and engorged manhood needs somewhere to go, something to do. Suck is top of the list of collocating verbs whose object is the male member, but when push comes to shove (also to pound, drill, ram, or thrust) and the stiffy is the grammatical subject, it generally has only two, rather predictable destinations: there’s ass, and there’s pussy. Asses get pounded (drilled, rammed, etc.) with somewhat greater frequency in the corpus and this is surely because everyone has one—the corpus data in fact shows his ass is getting pounded slightly more often than her ass. But if we take into account the less frequent synonyms of pussy (slit, cunt, honey hole), the numbers come out about even across verbs aimed at the front and rear receptacles of penetrative action.
All of this ass-pounding is perhaps mainly performed in the dark because the modifiers that typically accompany that part of the anatomy with statistical frequency are predictable: tight, round, bare. Pussy, by contrast, is appreciated for a range of qualities far richer than the male member enjoys, perhaps reflecting the broader variety of ways that the senses may engage with it: tight, hairy, juicy, pink, bald, sweet, moist. It is often described as dripping, soaking wet, even glistening.
The average person would find an activity monitor for the tongue unwieldy, but it would probably reveal that we use it a lot for talking and a lot for eating, and surely somewhat less for other activities. Not so in the fantasy world online, where tongues get no letup from licking. Two thirds of the most salient objects of lick in the corpus are things that you would not lick on, or in the presence of even a second-degree relative: pussy, clit, nipple, asshole, cunt, ass, cock, tit, shaft.
I have come this far without even mentioning the main verb fuck and its partner shag that permeate the world of porn writing, and there is no need to go there: you’ve already read about it, in Alon Lischinsky’s recent post, and the corpus he constructed shows patterns analogous to those found in enTenTen13. The baffling thing to this lexicographer is why is there so much of this stuff out there and who reads it when it is so numbingly predictable. I suspect that a lot of online porn may find its most attentive reader in the web-crawler that collects, tags, and parses it—and that it provides its greatest thrill for those who create it, rather than those who read about it. Not so different from the sex act itself.
One thought on “Collocations of ‘cock’: What corpus linguistics tells us about porn writing”