6° of Aberration

Looking for my alter ego...I'm sure I left it someplace around here...

Name:
Location: California, United States

Tuesday, November 30, 2004

Corpus Blogus

What began as a small experiment in blogging has lasted far longer than I ever expected. Yesterday's entry was my 100th post to this weblog (excluding the original, "Shirley, You Jest," test case) and it concludes nearly six months of writing.

You might now expect me to editorialize upon what I've learned, or look back and reflect upon what I've written, or speculate upon what, if anything, comes next. Not so. This ain't your momma's weblog, baby. Instead, I chose to commemorate this inexplicable pastime with a similarly irrelevant and inconclusive analysis.

I wondered what one might discover by running a word count utility against the first 100 entries. I'm sure you've all been wondering the same. Fear not, I have the data.

Those 100 entries exceed 53,000 words, the equivalent of a short novel. English grammar being what it is, much of that word count consists of only a few common words. In fact, the top five words alone account for 15% of the total word count:

the (2664)   and (1451)   to (1417)   of (1310)   a (1264)

Nearly one third of the word count can be attributed to a mere 30 of the approximately 8,600 unique words tabulated. Rounding out the top 30 words then are:

I (1059)   in (694)   that (664)   it (627)   for (518)

was (481)   is (392)   my (361)   with (358)   but (340)

as (322)   on (318)   one (312)   you (299)   at (269)

he (269)   his (228)   so (228)   me (224)   this (224)

from (223)   have (221)   not (219)   by (216)   first (214)

I have a list I've made of approximately 300 words—articles, conjunctions, pronouns, prepositions, contractions, etc.—that are the connective tissue of English sentences.

I ran the utility with instructions to strip out those words and it deflated the total word count by 50% to roughly 27,000 words. Nearly 8,300 unique words remained, but these were the more interesting nouns, verbs, adjectives, etc. that reveal to the psycholinguistically astute and the numerologically obsessed, clues about the hidden semantic mysteries of a passage.

Here are the top 30 words once the file had been deflated by 50%:

first (214)   more (147)   read (131)   time (128)   boys (109)

2004 (105)   book (105)   years (91)   get (90)   books (86)

Justin (77)   two (75)   day (73)   know (72)   line (71)

movie (71)   back (69)   year (65)   Andrew (64)   reading (64)

John (63)   Kevin (63)   three (63)   last (60)   good (58)

little (58)   story (58)   made (57)   make (57)   novel (57)

Anyone following this blog is unlikely to see any surprises in that list. Perhaps something can be learned by examining groups of words that have the same number of appearances. For example, here are the 30 words which each appear exactly 15 times in the 100 entries of this blog to date:

call   class   com   consider   entry
especially   expected   fans   feel   final
free   friend   including   internet   June
keep   leave   letter   looked   material
Michael   minutes   mom   note   rejection
script   stranger   trying   turn   weeks

One begins to feel like the John Nash character in A Beautiful Mind trying to make sense of the cryptological messages he believes to be hidden in the newspapers. Speaking of mathematicians, that reminds me: numbers, too, are counted by this utility. Perhaps something can be divined from the following:

1 (29)  2 (18)  3 (27)  4 (17)  5 (17)  6 (20)  7 (20)  8 (12)  9 (18) 10 (21)
11 (12) 12 (23) 13 (14) 14 (16) 15 (19) 16 (11) 17 (11) 18 (10) 19 (12) 20 (18)
21 (6) 22 (11) 23 ( 6) 24 (7) 25 ( 9) 26 ( 6) 27 ( 2) 28 ( 4) 29 (5) 30 (12)

I'll leave that as an exercise for the avid reader. You may also need to know:

one (312)   two (75)   three (63)   four (21)   five (31)
six (18)   seven (22)   eight (7)   nine (7)   ten (17)
eleven (2)   twelve (6)   thirteen (1)   fourteen (2)   fifteen (5)
sixteen (3)   seventeen (1)   eighteen (2)   nineteen (20)   twenty (21)
won (7)   to (1,417)   too (45)   for (518)   ate (4)

Maybe scholars dedicated to analyzing this site will be better served by a complete concordance. (Don't worry: I'm not going to post the damn thing.)

Since we've determined that 19 is a mystical number, let's choose one word with 19 occurrences and see what that might yield. For illustrative purposes, let's choose the word, "point" since (22) you (299) all (196) are (194) beginning (20) to (1,417) wonder (13) what (112) mine (8) might (21) conceivably (0) be (200):

physical linkage to a fellow mammal seems a plus at this point. Damien is a friend.

My words begin plucking at threads nervously, seeking purchase, a weak point,

And don't even get me started on all the random point awarding by the professors.

unfamiliar dispute or the conclusion to a negotiation point that I had not yet

haul them back by their hind legs to the starting point. Ribbons would be awarded

Marx Brothers faking a mirrored reflection in a doorway, at which point we would

the hints; or 1 point for each correct title or author when you got only one right.

entwined with circular references, let me point out that the writer called Terrance

hundred times at least I've had someone point at Andrew and ask, "Is he the oldest?"

goes out of his way to point out Mersault’s use of the child’s word “Maman” when

writer alive who can match [Irving's] control of the omniscient point of view."

Drop City, maybe even The Tipping Point or Fast Food Nation is scheduled to appear

what woke them up." At one point Justin said, "This is starting to get scary,"

neither parent appears in the book and that at one point Tom apparently ventures

Anglo-Saxon hero myth of Beowulf from the point of view of the monster the hero

killed, rather than from the hero's vantage point. In so doing, he scored numerous

worse than use those twenty authors as the starting point for creating a reading list

But that would be missing the point. It's hard for any Red Sox fan who has lived

waving their "I believe" placards, a turning point to forever remember in a dream


Whatever is revealed in hindsight by reviewing my first one hundred blog entries depends, finally, upon the reader. I've written far more than I ever planned or expected, often on topics I never intended. What began as a whim, grew into a pleasant diversion. Public omphaloskepsis. Was it ever supposed to be relevant?

Now with this, the 101st entry, I've proven that every word counts, or at least, that every word can be counted.