OpenAI Bot Allegedly Disrupts E-commerce Sites
Overview of the Incident
Recently, Oleksandr Tomchuk, CEO of Trilegangers, reported a significant disruption to his company's e-commerce website, which hosts over 65,000 products. Investigations revealed that an OpenAI bot was aggressively attempting to scrape data from the site, leading to its temporary shutdown. The bot allegedly sent tens of thousands of server requests aimed at downloading all content, including extensive photo libraries and descriptions.
Nature of the Attack
Tomchuk characterized the bot's actions as a form of DDoS attack, as it effectively incapacitated their website. Trilegangers specializes in selling 3D object files and photos, catering to 3D artists and video game developers who require realistic digital reproductions of human features.
The website is crucial for their operations, representing over a decade of work to compile what is deemed the largest database of digital human avatars online, using 3D scans from real human models.
Protective Measures and Challenges
Despite having a terms of service page that prohibits unauthorized bot scraping, Tomchuk noted that these measures alone have proven ineffective. A properly configured robot.txt file is needed to instruct OpenAI's bot, GPTBot, not to engage with their site.
Robot.txt, or the Robots Exclusion Protocol, helps website owners communicate which content should not be scraped. OpenAI has committed to respecting these configurations but acknowledges that its bots may take up to 24 hours to recognize changes in the robot.txt file.
Tomchuk emphasized the importance of correctly utilizing robot.txt, asserting that without it, companies like OpenAI may presume they can scrape data freely.
Impact on Business Operations
The bot's activities have not only caused Trilegangers to go offline during U.S. business hours but have also raised concerns about increasing costs associated with their AWS services due to excessive CPU and download usage.
Moreover, the robot.txt system is not foolproof. Compliance is voluntary, as highlighted by a previous incident involving another AI startup, Perplexity, which faced criticism for ignoring robot.txt protocols.
Seeking Accountability
Tomchuk expressed frustration over the lack of communication channels to address the situation with OpenAI, which has not responded to inquiries from TechCrunch. Additionally, OpenAI has yet to release its anticipated opt-out tool, which would allow businesses to protect their content more effectively.
The implications of such scraping practices are particularly severe for Trilegangers, as they navigate complex rights issues related to the real human images they scan. Under laws like the European GDPR, unauthorized use of individuals' photos is prohibited.
The Exposed Vulnerability
Ironically, the aggressive scraping by OpenAI's bot has illuminated the vulnerabilities Trilegangers face. Tomchuk noted that if the bot had operated more subtly, the extent of the issue might have gone unnoticed.
He criticized the current approach, stating, "These companies exploit a loophole to scrape data, claiming that if you update your robot.txt with our tags, you can opt out." This places the burden on business owners to understand how to effectively block unwanted scraping.
Tomchuk urged other small online businesses to actively monitor for AI bots that could be infringing on their copyrighted assets. Reports from other website owners indicate similar disturbances caused by OpenAI bots, leading to increased operational costs.
Future Outlook
Looking ahead, the challenge presented by AI bots is expected to escalate. A recent study by DoubleVerify predicts an 86% increase in general invalid traffic in 2024, largely attributed to the activities of AI crawlers and scraping tools.
Key Points
- An OpenAI bot has allegedly caused a DDoS attack on Trilegangers' e-commerce site.
- The incident highlights vulnerabilities in website protection against AI crawlers.
- The robot.txt protocol is not foolproof, and compliance by AI companies is voluntary.
- Tomchuk emphasizes the need for better communication with OpenAI regarding scraping practices.