AI Detection Tools: Not Ready for Prime Time Yet

With all the buzz about AI tools like ChatGPT generating content, I want to offer a word of warning for people using AI detection tools.

They don’t work.

It’s as simple as that.

I’ve now tested more than 100 different articles on several different AI checkers including the most popular ones on the market to date. All of them gave me completely different results and conflicting results.

One of the one getting a lot of run right know is Originality.AI. Originality AI says it’s 94% accurate. That’s below the acceptable standard for most scientific research studies to be accepted. As an example, AI for facial recognition has been under fire for its failure to accurately detect people of color despite a recent 99.7% accuracy rate. While I’m sure AI detection tools will improve over time, I think there’s serious reason to be cautious.

If an AI tool says something I wrote is 87% AI and only 13% human, how do I prove I wrote it? I’m not choosing those numbers at random. That’s the score I got on a recent article I took two days to write on a very complex technical assignment where wording is precise. 0% copyscape matches, 0 grammarly matches…and an 87% written by AI score.

By the way, when I went back in and added complex sentences, used 50-cent words, slang, wrote in fragments, and informal language (none of which are acceptable to the client or subject matter), guess what? 100% human! So, crappy writing shows as “Fantastic!” according to the AI detection tool, but well-researched, formal, technical writing did not.

FYI, most of what triggered the AI rating was the H2 headers that were provided by a client, using another AI software to generate “best practices” for writers.

To make things more confusing, Copyleaks AI detector said both versions were human-written. The GLTR test room developed by MIT-IBM Watson and Harvard said both versions were primarily AI written.

What does all this mean? The landscape is shifting rapidly. Before ChatGPT, the quality of the AI writing was poor. ChatGPT made dramatic steps forward. I’ve tested it thoroughly and can honestly say some of the writing it produced is very good… although not always accurate. The next generation (coming soon) is said to be light-years ahead. I can’t wait to play with it.

But the AI detection tools are waaaaay behind (I spelled it that way because an AI wouldn’t write it like that – oops, it still flagged it as AI!

FYI: This post was written by a human. Some AI detection tools agreed. Others, not so much.

How an AI Tool Wrote This Same Post

Just for fun, here’s what an AI writing tool wrote on this subject:

Artificial Intelligence (AI) detection tools, such as Originality.AI, Copyleaks, and the GLTR test room, have become increasingly popular in recent years as a way to identify plagiarism and detect original content. However, these tools are not always accurate and companies should be cautious about using them to evaluate writers. Here are five reasons why these tests may fail to detect human writing accurately:

Language complexity: AI detection tools rely on algorithms to identify patterns and similarities in text. However, these algorithms may struggle to understand the complexity of human language and may not be able to accurately identify plagiarism in more advanced or nuanced writing.
False positives: AI detection tools can often produce false positives, identifying plagiarism where none exists. This is particularly problematic when evaluating writers, as it can lead to unjust accusations of plagiarism and damage to a writer’s reputation.
Lack of context: AI detection tools may not take into account the context in which text is written, making it difficult to accurately identify plagiarism. For example, paraphrasing or using common phrases or idioms may be flagged as plagiarism when they are used in a different context.
Limited database: Many AI detection tools only check against a limited database of sources, meaning that they may not be able to detect plagiarism from sources that are not included in their database.
No real understanding of the text: AI detection tools are not able to understand the meaning of the text, so it can’t tell if the text is original, if it’s a copy but with a different meaning or if it’s a copy but with a different context.

In conclusion, while AI detection tools can be a useful tool for identifying plagiarism, they should not be relied upon exclusively. Companies should be aware of the limitations of these tools and should use a combination of methods, such as manual review, to evaluate writers and detect plagiarism. With the advancements in technology, the AI detection tools will become more accurate, but until then, companies should be cautious in their use of these tools to evaluate writers.

Have to admit, it’s pretty well written and touches the high points. What it lacks? Originality and experience.

The AI Score

99% human-generated content… for text completely written by AI. So, the AI scored better as “human-generated content” than my original, researched, and nuanced writing did. Great.

How an AI Tool Wrote This Same Post

The AI Score

Related