There are some characteristics of language use that can be very hard to deliberately alter -- particularly with the (generally low) amount of effort someone is likely to put into writing Usenet posts. Things like the frequency with which specific words are used over a statistically significant text sample, for instance. I don't know about off-the-shelf software for doing this sort of analysis, but techniques like these have been use to help determine whether to ascribe particular writings to particular historical authors, e.g. the "Were all of Shakespeare's plays really written by one man?" question.
FromTheRafters stated in post ifu205$vst$-september.org on 1/3/11 7:51 PM: While there are tools to try to determine if one set of text is written by the same author as another set, the tools are far from perfect and require a fairly long sample. I was working with a professor who had access to (and was improving) such tools. I used it to confirm my suspicions with Steve Carroll and some of his socks, but *alone* I would not have considered it enough evidence to be certain (though after I used about 100 of his posts and about 50 of his socks, the confidence level reported by the software was over 90%). Steve also has tried to mimic my style with other socks of his - and the best I found he could do was about a 60-65% level of confidence by the software, and much lower when his exact quotes of mine were removed (around 40%). Still, the fact he was able to get above the baseline 5% or so was telling - it indicated that he could "capture" some of my style. I suspect others could do better. I also assume there might be better software out there to do this task, but I do not know of any.
Yes, but if they continually misspell a certain set of words, that can be enough for a non-authority "fingerprint". Also, use of a certain phrase can be a clue. I recall a poster using "quiet" for "quite" among other things and some thrown in 'as it were' that convinced me it was a person already known to me under another nym - i didn't need an IP.
Yes, good point. The authorities are concerned with finding the contactable person, whereas non-authorities are concerned with 'nym- shifting' and finding what nyms a poster is using. As to the program, I'm pretty sure it was a custom written program and not commercially available. Remember reading about it around 7 years ago in a major magazine. RL
[On software that can detect a person's writing style] Really? I am cross-posting this to humanities.lit.authors.shakespeare in the hope that they can tell us what the name of this software is, or whether they've heard of it. RL
RayLopez99 stated in post on 1/4/11 2:01 AM: I had access to a version of it a while back... and found it to be only moderately useful unless you had a fairly large sample. Even then it was not certain.
An Old Friend stated in post on 1/4/11 9:15 AM: Agreed. Esp. when used with Usenet, a medium it was not designed for. At least the tool I used was meant for longer pieces of text - mostly to see which parts of the bible were written by the same person (though with all the edits that was seen as unlikely to be of much use) and of some historical works which are assumed to not be (or be) from the claimed authors - such as the example of Shakespeare talked about before. There was also quit a bit of focus on Dickens and Twain. But, again, these were novels or at least short stories being looked at - not the type text generally seen in Usenet. Still, looking at large numbers of posts with the assumption that headers were enough to tell if they were at least from the same person was enough to give evidence of who's sock was who's. But, no, not strong enough where it could (or should!) be used in court. I no longer have access to the program. Was fun to play with when I did.
That's not the point, I labeled no-one, but I "knew" that posters with different nyms were the same because of an inability to change ingrained habits. Your 'if things were different, they wouldn't be the same' approach doesn't give me much faith in your competence either. I said "among other things" so can't be offended by your lame attempt at an ad hom remark.
I would guess it would be absolutely useless - in fact, prejudicial. It seems to me that even so, it can be a good investigative tool.
Don't mix websites with email with Usnet. They are different protocols with different characteristics. Every modern NNTP server, or Usenet server if you wish, supports the use of the "NNTP-Posting-Host" header, described in RFC 2980 and other RFC's. This was finally implemented widely because of the history of forged cancellation messages by the cult of scientology. (No, I'm not kidding, loolk up the history of alt.religion.scientology and forged cancel messages and Usenet spew by cult members trying to bury a newsgroup.) This is *NOT* the IP address of the sender. It is the IP address of the NNTP posting hosting host, which may be connected by any client by any means that server accepts and may display no record whatsoever of the connecting client. But it is the host that first submitted it to Usenet, accoriding to the handling by all other NNTP servers. But it is enough to do a lot of backtracking to the site that is hosting the abusing spammer or canceller or troll, and it's been helpful You can't backtrack material, even with voodoo tools, if the intervening hosts didn't record the data in the message or in their own system logs where you can access it. Few sites bother to keep such logs, or react kindly to requests for such information, especially without a warrant. Of course, if you're the NSA, you can just place illegal but federally forgiven taps on the nation's fiber-optic backbones. (Look up the AT&T fiber-optic tapping case: it was nasty.) Bulletin boards are not NNTP. Like Wiki's, they typically have logs of the incoming connections and their IP addresses which can be read, or if necessaary their traffic can be sniffed. Once a Usenet message message gets to you, though, those connections have been broken and may be very awkard to track. NNTP does suffer from header forgery, but the NNTP-Posting-Host has been very helpful in reducing abuse: it allows tracking back to the host that accepted the message, or at which the header was forged, pretty effectively. Getting such a subpoena is pretty awkard: I've tried, and was told not to wast the time of the otherwise friendly law enforcement if I was not the person suffering demonstrable monetary loss over a pretty generous limit. (It was $30,000 over 10 years ago, I'm sure it's increased since then.) They wouldn't be able to justify the manpower and the subpoena.