Have you seen the old New Yorker cartoon in which two dogs are
sitting at a computer, and one is telling the other On the
Internet, nobody knows your a dog?
Hmm. Let me share some reasons why online anonymity is a little more complicated than that.
First, let's dispense with the obvious. If you write a large body of information drawn from your personal experiences, then anyone reading with a reasonably analytical turn of mind will be able to deduce a great deal about you (unless you're deliberately deceptive). Gender, profession, hobbies, and location are the obvious things. This information may give a reader a shrewd guess to who you are; if not, it may at least give a guess that you belong to a small group of people.
Now, let's discuss the more interesting piece: personal style.
If you are reading this as an academic, you will appreciate that
writing style, together with a few contextual clues, is enough to
often render blind review
a polite fiction. You send me a
review in which you way not enough attention is paid to the
following references
and list twelve references with a common
co-author; I may well guess who you are. Or I write a review of
your paper after we have communicated verbally or in writing; you
may well recognize me simply from the way that I choose my words.
The blind review
is really only hazy, but that's fine. I may
still have doubts about my guess, as might you. Similarly, if I
were to read an anonymous blog written by someone I knew, I might
well be able to guess that person's identity, but it would be hard
for me to be sure.
What is it that you identify as a particular writer's style?
My guess is that a large fraction of what you identify as style has
to do with statistical patterns in language use. The most obvious
things involve sentence length and complexity, the sort of metrics
that lead one to declare that some author writes at an nth grade
level. But many of us have our own personal catch-phrases, too. In
music, people have discovered that by looking at the probabilities
associated with very short sequences of notes, one can distinguish
the music of Bach, say, from the music of Mozart. On Amazon, you
may have noticed the introduction of Statistically Improbable
Phrases (SIPs) on the book description pages. Many of the best spam
filters work by training a classifier to distinguish spam from
legitimate mail on the basis of automatically-determined statistical
properties in the text. A blog is a large volume of text, which is
exactly what's needed for such statistical text classifiers; given
the text of a blog and a set of texts (e-mail messages or class
papers, perhaps) written by possible authors of said blog, I dare
say it would take me about a week to put together a good tool to
decide who -- if any -- of my suspects was actually responsible for
the writing. Given that there are people out there who know much
more about this sort of thing than I do, I would be surprised to
find that such a tool does not already exist. And I haven't even
said anything about tracing the link structure!
I think it would be interesting, actually, to try to build classifiers in this way to see if there are any two distinct authors who write in such a similar manner that it's impossible to distinguish their writing based only on the probabilities of two- or three-word sequences. Or could such a classifier be trained to automatically recognize regional dialects, identifying from what part of the US a particular writer might hail? How reliable would such a classification be? But I digress.
I suspect that most people would be shocked at how easily their
privacy and anonymity could be removed if anyone cared to take the
effort (though anyone who has stood in line at a grocery store knows
full well that the private lives
of the rich and the famous
are often anything but private). But usually nobody cares to
make the effort. If you wish to be anonymous, the polite thing
for me to do when I recognize you is to pretend that I know
nothing. But this assumes that I've paid enough attention to
recognize you in the first place. It's a little like an inexpensive
lock: a deterrent that tells most people that it would take some
bother to steal the protected item, and tells the few people who
might be able to pick the locks with no difficulty that you'd really
consider it rude for them to run off with your stuff.
- Currently drinking: Green tea