A new online archive of Civil War correspondence promises to shed light on historical varieties of nonstandard American English. Two linguists, Michael Ellis (Missouri State University) and Michael Montgomery (University of South Carolina), have teamed up with historian Stephen Berry (University of Georgia) to create “Private Voices,” an archive of letters from Civil War soldiers. Based on correspondence collected by Ellis and Montgomery as part of the Corpus of American Civil War Letters, the Private Voices archive focuses on the writing of soldiers who were “untrained in spelling, punctuation, or the use of capital letters,” according to the press release announcing the launch of the site (which you can read here).
Soon after news of the archive was shared on the American Dialect Society mailing list, Jonathan Lighter (author of the Historical Dictionary of American Slang) began looking for hidden treasures. He swiftly turned up a letter from 1862 in which the author, an infantryman from Virginia, appears to express a violent sentiment: “I want to kick ass.”
This is a guest post by Orin Hargraves, an independent lexicographer, language researcher, and past president of the Dictionary Society of North America. Orin is the author of several language reference books, including It’s Been Said Before: A Guide to the Use and Abuse of Clichés (Oxford) and Slang Rules!: A Practical Guide for English Learners (Merriam-Webster).
A few years ago I wrote about how collocations in fiction skew the statistics of collocations in a corpus because of their extremely frequent use; Ben Zimmer expanded on the idea in a later New York Times piece. In summary, the point is that a number of collocations would not be statistically significant were it not for their appearance in fiction. This is because writers of fiction—particularly writers of the amateur, unedited fiction that appears online—tend to reuse the same tropes and phrases so much that these effectively become clichés, formulaic ways of expressing the same (rather tired) ideas and events.
All of that came to light when I was working with the Oxford English Corpus, a well balanced and carefully curated corpus that, at the time, had about two billion words of English. These days I’m working with the enTenTen13 corpus, a web-crawled corpus of nearly 20 billion words, owned and made available by Sketch Engine. Sketch Engine’s web-crawler roves the Internet indiscriminately, pulling text from wherever it can be found. Like some grandmother aghast in Greenville, the web-crawler regularly comes upon sites with pornographic content. The difference between the grandmother and the web-crawler is that while she may avert her gaze in shock and dismay, the web-crawler grabs the text, parses and tags it, and adds it to the corpus. The result is that enTenTen13 houses a steaming, pulsating trove of pornographic writing.
Speaking on MSNBC earlier today, Georgia Republican congressman Buddy Carter used a colorful expression to vent his frustration over the Senate’s lack of progress in overhauling the Affordable Care Act: “Somebody needs to go over there to that Senate and snatch a knot in their ass.”
Have those creepy clowns been terrorizing your neighborhood this autumn? Kick ‘em in the seat of their oversized, particolored pants with this choice insult: assclown. To be sure, I’m certain we can all conjure up some far stronger words for those evil motherfuckers, but let’s have a closer look at this jester jibe.
It’s wink-wink-nudge-nudge all the way down with these new ads, one circulating in San Francisco, the others in U.S.-wide distribution.
The San Francisco ad, which I spotted on the side of a Muni bus, is for CUESA, the Center for Urban Education about Sustainable Agriculture, which operates several huge farmers’ markets each week in San Francisco and Oakland. The ads are meant to persuade shoppers to embrace less-than-supermarket-perfect fruits and vegetables.