What pleas they may fuck out of such books: Google Ngrams vs Long-S

Ever seen an old printed book with the letter S that looks like an F? This ligature, to the uninitiated, looks like ſ; it’s called the ‘long s’, and it has very much fallen out of use in modern typography. John Bell is widely credited for the demise of the long S, which is why we don’t see it very much any more, but it is often seen in European books printed between the 1400s and 1790s.

The google ngram reader relies heavily on optical character recognition (OCR) software to make their books searchable; OCR software  strives to match each printed character in a text to a recognized typographic character. Even human readers can have difficulty with reading text which heavily use the ſ, as seen from this 1739 printed example of Ben Jonson’s The Alchemist:

the-alchemist-1739Ben Jonson’s The Alchemist: A Comedy, first performed in 1610 and published 1739, from the Internet Archive

The Google Books Ngram project is a thoroughly imperfect resource for studying linguistic change in English-language print, mostly because out-of-fashion typographic conventions such as long-S completely throw off searches. To the untrained eye, or to a computer doing its very best to apply modern rules to anachronistic text, the word ‘suck’ using the long-S looks an awful lot like “fuck”. Google seems to know about it, too, as they make their default search dates 1800-2000, but you can easily change that to 1500-2000 and observe the differences in uses between ‘suck’ and ‘fuck’. The primary difference is that between 1650 and 1790, ‘fuck’ appears to be printed far more often than than ‘suck’, with a noticeable switch around 1665:

Continue reading