He founded the Internet Archive with a utopian vision. That hasn’t changed, but the Internet has

In his library, Brewster Kahle dances. He smiled as he swayed on the spot, an antique Victrola filling the entrance hall of the building, a former church, with the raucous tunes of old-time jazz.

He raises the needle and the music stops, but just for now. Soon, his staff will convert the aging record into a string of ones and zeros that will live forever in cyberspace. It’s the Internet Archive, and that’s why he and Kahle are here: To make every bit of digital or physical information that exists for free, online.

To walk with Kahle through his columned temple of knowledge in the Richmond district of San Francisco is to understand the magnitude of what he and his staff, now numbering more than 100, have worked hard for nearly 25 years. . In a loading area, piles of donated books await their turn on a specialized scanning machine where, hidden behind a black curtain, a technician meticulously copies endless pages.

On the ground floor, reels of microfiche are being converted into computer images that will join the staggering amount of data the archives have collected over the years.

Its servers contain more than 70 petabytes of unique data – 70 million gigabytes – including 65 million texts, movies, audio files, images, books and more.

Kahle’s quest to build what he calls “An Alexandria Library for the Internet” began in the 1990s when he began sending programs called crawlers to take digital snapshots of every page on the web, including hundreds of billions are available to everyone through the Archives Wayback Machine.

This vision of free and open access to information is deeply tied to the early ideals of Silicon Valley and the origins of the Internet itself.

“The whole point of the Internet and specifically the World Wide Web was to make sure everyone was a publisher and everyone could go out there and have a voice,” Kahle said. For him, the need for a new type of library for this new publishing system, the Internet, was obvious.

But while Kahle’s goals haven’t changed, the internet has. This early utopian vision of the positive forces of digital interconnection is increasingly at odds with the treasure troves of copyrighted and pay-per-view material online that grows every day.

Left: A 1947 Albany (NY) Times newspaper in the offices of the Internet Archive.  Right: Book scanner Eliza Zhang opens a box containing the Albany Times newspapers.

Left: A 1947 Albany (NY) Times newspaper in the offices of the Internet Archive. Right: Book scanner Eliza Zhang opens a box containing the Albany Times newspapers.

Photos by Constanza Hevia H./Special for The Chronicle

When the archive began collecting, most people online went to a few major homepages such as Yahoo.com, said University of Washington professor and Silicon Valley historian Margaret O’Mara.

“Now not only is there so much more information, but also a lot of that information is proprietary,” O’Mara said. “There are questions about how the Internet works and how the Internet economy works that cannot be answered by capturing web pages or capturing documents or scanning a magazine.”

Despite this, she said the archive is an invaluable resource for researchers like her and reflects the idealism behind Silicon Valley’s dream of a more open, connected and accessible world.

“They hold on to the past in a way that’s a rare thing to see in the industry and a community that’s always so focused on the future and focused on what the next thing is,” O’Mara said.

This shifting online landscape is on Kahle’s mind as he makes his way to the beating heart of the archives’ cavernous main hall. The space is calm. Diffused with a golden light that filters through the windows, the old nave of the church still seems sacred. Few people are in the building because of the pandemic, but this room is never truly empty, its pews populated by miniature statues of past and present employees and volunteers, including a bespectacled one of Kahle himself.

Here, server banks hum and flash with every upload and download as Kahle explains how libraries, even in cyberspace, can burn.

Across the auditorium flanking the main stage where anthem numbers were once displayed, three numbers are chosen in metal: 200, 404, and 451. The first two are common Internet codes indicating when a page is successfully accessed or nope. The third appears when content has been removed for legal reasons, such as copyright infringement.

It’s also not coincidentally a reference to Ray Bradbury’s anti-censorship novel “Fahrenheit 451.”

Book Scanner Eliza Zhang, one of more than 100 employees, works at the Internet Archive offices in the Richmond District.

Book Scanner Eliza Zhang, one of more than 100 employees, works at the Internet Archive offices in the Richmond District.

Photos by Constanza Hevia H. / Special for The Chronicle

Kahle has said in the past that if a library and its books burned, copies would likely live in another physical space. “That’s not the case on the web,” he said. For example “If a newspaper goes offline in Turkey, all its archives disappear. And that’s not how you can run a culture.

The archive has been buying and digitizing books for years, lending them for free through its site with a waiting list like other libraries. But when the coronavirus pandemic hit last year and libraries and schools closed, the archives created what they called the National Emergency Library, an online collection of 1.4 million books. accessible to users immediately.

A lawsuit by four of the nation’s largest publishing houses soon followed, one of the many challenges the archive faces in its quest for freedom of browsing rights in cyberspace.

Kahle argues that copyright laws don’t prevent libraries like hers from owning, digitizing and lending books with certain controls in place.

Perhaps an even bigger hurdle in Kahle’s mind is smartphones and the proprietary, protected apps that populate them.

“These things are full of apps that aren’t open,” he said, raising his phone during a recent Zoom call. It also means that many of them are immune to his robots and cannot be saved for posterity. This is a deeply vexing issue for the archive’s mission, as well as paywalls, which can and do block Kahle’s crawlers.

Brewster Kahle, who founded the Internet Archive 25 years ago, talks about the San Francisco organization's servers, which hold more than 70 million gigabytes of data, including 65 million of text, movies, audio files, images, books and more.

Brewster Kahle, who founded the Internet Archive 25 years ago, talks about the San Francisco organization’s servers, which hold more than 70 million gigabytes of data, including 65 million of text, movies, audio files, images, books and more.

Constanza Hevia H. / Special for The Chronicle

The original Internet format of hyperlinks still used today allows people to “weave knowledge together”, he said. But “the world of applications is inherently siled in enterprise products. This is not how we are going to build a culture that interacts, builds on each other and can generate new ideas.

Kahle’s career in technology dates back to the early 1980s when he graduated from the Massachusetts Institute of Technology, where he studied artificial intelligence before graduating. He helped found a supercomputer company called Thinking Machines before creating the first Internet-based publishing system called Wide Area Information Server, which was eventually sold to America Online.

In the past, Kahle has also found ways to make money from software without sacrificing the archive ideal. When he sold Alexa Internet, a web search and information company he co-founded in the 1990s, to Amazon, he struck a deal with then-CEO Jeff Bezos. He would only sell the software if Bezos allowed him to continue donating a copy of the Internet to his archive every day. Bezos agreed.

The Internet Archive today is funded by many small donations averaging about $20 each, according to Katie Barrett, the archive’s senior development manager. The archive also makes money digitizing books for libraries and receives funding from the Kahle/Austin Foundation Foundation, which was founded with Kahle’s wife, Mary Austin.

Tax forms for 2019 show the archive’s revenue exceeded $36 million for the year, including nearly $30 million in contributions and grants.

In its quest for a more open and accessible world, the nonprofit organization is working with Wikipedia, fixing links and updating pages that link to sites that would be lost if the Wayback Machine hadn’t backed them up. first place. Working with the archive, Wikipedia has added more than 25 million archived web pages, mostly from Wayback Machine links, to 150 language editions of Wikipedia.

“We share a vision of the internet where nonprofit services can increase humanity’s access to knowledge,” Gwadamirai Majange, spokesperson for the Wikimedia Foundation, owner of Wikipedia, said in an email. .

The Internet Archive building in the Richmond District.

The Internet Archive building in the Richmond District.

Constanza Hevia H./Special at The Chronicle

The archive has also partnered with groups such as the Digital Public Library of America, primarily providing digitized print material to its site.

Groups such as the Long Now Foundation also seek to foster this type of long-term thinking through its 10,000 year old clock and one project to create a digital library of human language for future generations, partly as a counterpoint to the short-term, profit-driven models of modern tech companies.

Kalhe has also extended his nonprofit efforts outside of the digital world.

Among these was an ill-fated attempt to establish a box with $1 million from the archives. A more successful bid led him to start another nonprofit and buy a nearby apartment building in San Francisco where some of his employees live at below-market rates.

For his part, Kahle said he recognizes the growing challenges of the mission, but that hasn’t stopped him yet. “I wake up on different sides of the bed saying, you know, this is gonna work, and we’re gonna make it,” he said. “And then other times it’s like there’s so much against us.”

Despite this, Kahle’s servers continue to flash blue with life in this large silent room. And so long as millions of people continue to access the seemingly endless collection, the Internet’s Library of Alexandria will live on, long after its founder, as he puts it, “goes to the great archives in heaven.”

Leave a Reply

Your email address will not be published.