In the legal jungles of Calcutta

larsmagne23

2 months ago

This post is about me being sued in India, but mostly about using LLMs.

In 2022 I got the pile above in the mail. It turned out that this was a further iteration of the Indian lawsuit that named me as a codefendant (along with, er, Google and Yahoo) because I ran a mailing list archive called Gmane, and somebody had posted something possibly defamatory there.

I’ve been clearing out the loft (because I want to put more stuff up there), so I was just going to throw this bundle out…

But then it occurred to me that if I scanned the pile (I’ve got a scanner and I know how to use it), I could just run it through my OCR pipeline and then ask an LLM to summarise what it’s all about. I’ve been moderately curious, but not sufficiently to actually try to poke at the pile before.

And even people who say that they’re AI sceptics add “but you can use AI to do research, for instance to summarise documents”, and… I’ve been sceptical about those sceptics, because whenever I try to use an LLM, it mostly spits out something wrong at me.

So… it’s an experiment!

This is perhaps the most interesting bit for me personally (and I’ve actually read some of it myself):

It’s a summons from the Calcutta High Court saying that I have a couple months to reply, or I’m in deep shit. Or something.

(I don’t think Norway has an extradition treaty with India, and I certainly don’t plan on ever going there, so I didn’t really care much.)

I scanned the 1,200 pages (while watching The Four Seasons (Tina Fey edition) series, which is pretty amusing), and then ran it through my OCR pipeline. There I am:

(If you want to read these documents, here’s the resulting PDFs.)

But the OCR result was basically a 3.6MB text file, and I uploaded it to Claude.

I used the default settings, which apparently is Sonnet 4.6 Auto. And I apologise in advance: Friends don’t make friends read AI-generated text, but I can’t write this blog post without showing you the Claude-generated material:

What? 1982-2008? The lawsuit is from 2011… but it’s true, the papers include copies of previous legal things, when Milap Choraria sued Sanjay Jhunjhunwala, alleging that he’s stolen his land and stuff.

Claude spits out a few pages of this stuff, and doesn’t mention apparently not having read the documents beyond sampling a few here and there, I guess? Apparently it has a 1M character window, but… 1) it doesn’t say so anywhere, and 2) that’s not a good use of a 1M character window.

I gave it some pushback, but already here we see the problem: If you’re using an LLM to give you a summary, it will be as lazy as possible, and do some sampling of the text and output something. Anything. But the problem is that what it outputs sounds totally plausible — if you truly don’t know what the document is about, you can’t give pushback like this.

And then pages and pages of this.

“Good catch”, but still no go.

So now it has resorted to gaslighting me.

“Looping”? What? It’s a text file with the scans — so here it isn’t just gaslighting me, but it’s… what.

Finally!

Heh heh heh. Do I need to do that, though? Are you sure?

So: If I were to rate this interaction, I’d rate it “Not Just A Total Waste Of Time, But Actually Unhelpful And Misleading”. And this is the stuff I see lawyers on Twitter champion all the time — “it’s so good for doing research, but of course you have to check the output”.

They’ve been saying this for years! And Sonnet 4.6 was released in February of this year!

But to be fair, I then started a new session with Opus 4.8 on “Max”:

Hey! There I am!

This is much better — it says I’ve uploaded “Volume 1”, but I’ve uploaded both volumes (in the same file), so this obviously means that Opus has just sampled bits of the text file, too, but it’s landed on more relevant parts than Sonnet.

But it doesn’t say anything about the summons:

OK, so Opus needed prodding from somebody who knew what they were looking for, too — the simple “summarise this” didn’t work with Opus either: You actually need to know what the papers are about for Opus to cough up the interesting bit. Because (I’m guessing) it didn’t actually read the entire 3.6MB of text, either, but just did some samples here and there, and then faked a plausible answer.

I vaguely wonder what the case status is, but Claude is apparently blocked from accessing those sites, so I guess I’ll never know, unless somebody wants to check for me… (Hint hint!)

Anyway, if nothing else, this has given me something original to decorate my walls with. Not many people have a Calcutta High Court summons to proudly display, I betcha!

But the real takeaway here is that if you have a lawyer that says “sure, I use AI to do case research” or you have an administrator going “if carefully handled, AI is so useful for doing research into large document collections”, don’t argue, back slowly away, and then run away as fast as you can. Those people are insane and are living deep inside some kind of delusion.