Open-Source Projects Struggling With AI Crawlers Overloading Their Systems

Image by Matt Wildbore, from Unsplash

Open-Source Projects Struggling With AI Crawlers Overloading Their Systems

Reading time: 3 min

AI-powered web crawlers have emerged as a major threat for open-source software communities which has caused widespread disruptions to their infrastructure.

In a rush? Here are the quick facts:

  • AI-powered web crawlers are overwhelming open-source software communities, causing severe disruptions.
  • Some open-source projects report that up to 97% of traffic comes from AI bots.
  • Projects are deploying AI-specific blocklists, but bots quickly adapt, continuing disruptions.

Popular repositories face resource strain from these bots which were deployed by AI companies to collect training data for language models thereby slowing down development, as first reported by ArsTechnica.

Drew DeVault from SourceHut shared his observations about these crawlers through a blog post which described their destructive effects. These AI bots circumvented the robots.txt file instructions which direct crawlers to avoid certain pages thus creating major outages on the SourceHut platform.

The crawlers attacked specific endpoints such as git logs and commits through random IP addresses to disguise their activity as normal user traffic. The bots made effective blocking impossible through their methods which created extended project task delays and user service disruptions.

The GitLab infrastructure of KDE suffered a temporary outage due to bots which originated from Alibaba’s IP range. The open-source projects GNOME and others suffered from identical attacks so they implemented Anubis as a system which requires bots to complete computational challenges before granting access to the site, as reported by The LibreNews.

The “nuclear option” introduced by Anubis resulted in increased wait times for actual users who encountered significant traffic growth in GNOME’s merge requests, reported LibreNews.

Ben, who works as KDE’s sysadmin, observed that the bots disguised their identity using Microsoft Edge user agents to mimic real users, and evade detection from legitimate traffic. The Fedora team reacted to the disruption by cutting off all Brazilian web traffic to stop further disruptions, says LibreNews.

The report by LibreNews indicates that many open-source projects now experience 97% of their web traffic coming from AI companies’ crawlers. Open-source projects face increasing challenges because bandwidth expenses continue to grow while system maintainers face rising pressure to maintain smooth operations.

Open-source projects currently use blocklists and AI-specific user agent filtering as emergency solutions, yet bot adaptations consistently render these methods ineffective.

The rising problem of AI crawlers reveals how open-source projects become exposed to threats because they depend on public infrastructure and volunteer support.

Open data benefits AI companies yet their extreme data scraping practices end up damaging the systems that enable open internet accessibility.

Did you like this article? Rate it!
I hated it I don't really like it It was ok Pretty good! Loved it!

We're thrilled you enjoyed our work!

As a valued reader, would you mind giving us a shoutout on Trustpilot? It's quick and means the world to us. Thank you for being amazing!

Rate us on Trustpilot
5.00 Voted by 2 users
Title
Comment
Thanks for your feedback
Loader
Please wait 5 minutes before posting another comment.
Comment sent for approval.

Leave a Comment

Loader
Loader Show more...