This Is How to Tell if Writing Was Made by AI
Odd Lots Podcast Recap
Published:
Duration: 48 min
Guests: Max Spiro
Summary
Tracy Alloway and Joe Wiesenthal explore the challenges of identifying AI-generated writing, featuring insights from Max Spiro of Pangram Labs. The episode discusses the subtle differences between human and AI writing and the implications of AI content proliferation.
What Happened
Fidelity Trader Plus, a new advanced trading platform, is now available on multiple digital platforms, marking a significant step in trading technology. Tracy Alloway and Joe Wiesenthal delve into the complexities of identifying AI-generated writing, noting that while AI writing often presents grammatical precision and remarkable turns of phrase, it typically lacks distinct style. Some books have begun including disclaimers to reassure readers that they were crafted without AI assistance, a testament to growing concerns about AI-authored content.
Max Spiro, CEO of Pangram Labs, joins the conversation, explaining how his company's service can effectively identify AI-generated text. Joe Wiesenthal recounts his testing of Pangram's service with translations, finding it notably effective. Pangram Labs' AI detection model boasts a false positive rate of about 1 in 10,000 and a false negative rate near 1%, leveraging deep learning trained on millions of examples to distinguish nuances between human and AI writing.
The model can detect writing patterns that are distributionally different from human-authored content, even if these differences are subtle. Pangram Labs estimates that around 40% of internet content is now AI-generated, with over half of new Medium articles being AI-generated as of a year and a half ago. Similarly, AI-produced content on Reddit has increased from 7% to over 10% in the past year.
AI-generated content is being used strategically by some companies to create organic mentions on platforms like Reddit, influencing model outputs by entering training data. Pangram Labs employs active learning to continually improve their model's accuracy by identifying and correcting errors. As AI-assisted writing becomes more prevalent, services like Google Docs and Grammarly are integrating AI to enhance user experience.
Pangram Labs differentiates AI-assisted from AI-generated writing by measuring the cosine difference in hyper multidimensional space. Their mission is to detect AI-generated content, providing value to various sectors including education, law, publishing, and individual internet users. Major internet platforms like Quora use Pangram's services to moderate content and identify AI bad actors.
As AI models become more sophisticated, their output becomes increasingly complex, necessitating larger models for accurate detection. Perplexity is used as a measure to determine how expected a piece of text appears to a language model, with low perplexity indicating less surprising or more predictable text. Early AI detectors encountered challenges with non-native English speakers due to the low perplexity of their writing.
Efforts to combat undisclosed AI-generated content include proposing norms against sending such outputs. Internet giants like Google are navigating the dual role of promoting AI-generated content while filtering out what is termed as 'AI slop' from search results. The severance of craft from output due to AI advances means that well-written content no longer reliably indicates the intelligence of the author.
Key Insights
- Pangram Labs estimates that around 40% of internet content is AI-generated, highlighting the prevalence of AI in creating digital content. This figure emphasizes the significant role AI plays in shaping online information.
- Pangram Labs' AI detection model, which has a false positive rate of about 1 in 10,000 and a false negative rate around 1%, uses deep learning trained on millions of examples to distinguish AI from human writing. This underscores the sophistication required to accurately identify AI-generated content.
- The AI writing landscape is evolving rapidly, with AI-generated content on platforms like Medium and Reddit seeing significant increases. This trend indicates a growing reliance on AI for content creation across various online platforms.
- Active learning is employed by Pangram Labs to improve their AI detection model by identifying and correcting errors, ensuring the model adapts to new writing patterns and maintains high accuracy. This approach is crucial for staying ahead in the rapidly changing AI landscape.