
Image by Oberon Copeland, from Unsplash
AI Bots Are Overloading Wikipedia’s Servers
The Wikimedia Foundation has raised alarms over the growing pressure on its servers due to automated bots scraping data to train artificial intelligence models.
In a rush? Here are the quick facts:
- AI bots are scraping Wikimedia content at record levels.
- Bots caused a 50% rise in multimedia bandwidth use.
- 65% of high-cost traffic now comes from crawlers.
The Foundation reported in a recent post that machine-generated traffic continues to grow at an unprecedented rate while people make up only a small portion of this traffic.
“Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%,” the post states.
“This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models,” the post added.
The bots known as crawlers steal large amounts of data from Wikimedia’s projects including Wikipedia and Wikimedia Commons without proper credit or official access tools. The process makes it difficult for new users to discover Wikimedia and puts excessive strain on their technical systems.
For example, the post notes that Jimmy Carter’s Wikipedia page received more than 2.8 million views during the day he passed away in December 2024. The 1980 debate video caused a significant increase in website traffic. A video of his 1980 debate also spiked traffic. Wikimedia handled it — but just barely. The real problem according to engineers is the continuous stream of bot traffic.
“65% of our most expensive traffic comes from bots,” the Foundation wrote. Bots “bulk read” content, especially less popular pages, which triggers expensive requests to Wikimedia’s core datacenters.
While Wikimedia’s content is free to use, its servers are not. “Our content is free, our infrastructure is not,” the Foundation said. The team continues to develop methods for promoting “responsible use of infrastructure” by urging developers to use the API instead of scraping the entire site.
The problem affects Wikimedia as well as numerous other websites and publishers. But for the world’s largest open knowledge platform, it’s threatening the stability of services millions rely on.
Leave a Comment
Cancel