How does AI Generation even work?

Thank you lostlovefairy for your question on AI tokens and emilypool977 for pointing out about how even simple tweaks done with help of grammar tools like Quillbot or Grammarly, get flagged as AI generated.

What is AI Generation?

AI-Generation is where you feed some data to an AI-bot like ChatGPT, Gemini (or toaster, thank you SeraDrake for the term, it's now stuck on me).

How does it even work?

The thing is, since 1950's when the question of "Can machines think?" came out, people have been trying, by all means, to figure out how to feed data to computer and make things easier for us. Take COMPASS for example, take bar code scanners, the easiest of machine learning examples, for that matter... everything has been, in some way or form, been programmed to imitate human intelligence.

Back to the question of how AI generation works and why is it a problem - I found this example the other day and I think I'll try to break down from there.

Question: "What is the capital of France?"

Binary representation: 01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01110100 01101000 01100101 00100000 01100011 01100001 01110000 01101001 01110100 01100001 01101100 00100000 01101111 01100110 00100000 01000110 01110010 01100001 01101110 01100011 01100101 00111111

AI-generated response: "The capital of France is Paris."

I didn't read half the 0's or 1's in it, I know you didn't either, but that binary representation is only important for showing how machine reads, comprehends text and gives back a response. This has been there since the time we could code our own programs on computers.

Some of the 0's and 1's get lost or there's other characters like "

s",  "\s" and "$" that are generated along with it, but we don't usually see it, because computer generated or AI generated text is able to hide it.

Simplest, example, go to your microsoft word software, regardless of how old that might be and look for this:

This has always been there, but this isn't what gets your normal text flagged as AI-generated. There might be "\t", "

s" (the s only being added for wattpad editor to not go new line on it) or others that are usually generated by GPTs and LLMs, or even Quillbot or Grammarly while trying to help you that are usually what picked up on by AI checkers. But, as long as you can prove that, ultimately, you wrote the text (and please try to write in a site where you can check revision history.

How do I escape it?

With cases of GPTs or LLMs, it's tough to remove the garbage tokens that may be generated. There are ways to view it with code, and possibly remove it, but I'll urge you to do your own research around it. I haven't found a plausible solution to it, but here's what I'll say.

The AI text detectors are largely garbage. They work by measuring perplexity, by running your text through a language model and considering the probability that the model would have chosen the same text as you.

You run into situations where the AI detector will flag things like the US constitution as being AI generated, which is the first sign that these things aren't measuring what people think they are measuring. The US constitution will have appeared many times it the training data, so it is extremely easy for a language model to spit out the US constitution verbatim.

The disconnect in reasoning then is that these aren't telling you whether a language model wrote the text, but whether a language model could have written the text.

To properly analyze whether a human or an AI wrote the text, you'd need to apply bayesian probability. The text in the US constitution should be coming out to around 50%, because it's equally probable that an AI would repeat it verbatim as a human would.

Okay, we could fix this with bayesian probability, but how do we calculate the likely hood that a human wrote a given piece of text? Well, you can't, really. The easiest thing to do would be to train a language model on a bunch of human writing and... oh, wait, we'd just be comparing the language model against itself. It would be impossible to tell the two apart. 

So, in conclusion, the task of differentiating human written text from AI written text is pretty much unsolvable. If your text is getting flagged as AI generated, then congratulations. All that means is that you are able to express you ideas in a clear and concise manner. 

Bạn đang đọc truyện trên: AzTruyen.Top