The Internet Archive turns 25 years old help save it *

The Internet Archive has been fighting for 25 years to keep what’s on the web from disappearing – and you can help

Kayla Harris, University of Dayton; Christina Beis, University of Dayton, and Stephanie Shreffler, University of Dayton

This year the Internet Archive turns 25. It’s best known for its pioneering role in archiving the internet through the Wayback Machine, which allows users to see how websites looked in the past.

Increasingly, much of daily life is conducted online. School, work, communication with friends and family, as well as news and images, are accessed through a variety of websites. Information that once was printed, physically mailed or kept in photo albums and notebooks may now be available only online. The COVID-19 pandemic has pushed even more interactions to the web.

You may not realize portions of the internet are constantly disappearing. As librarians and archivists, we strengthen collective memory by preserving materials that document the cultural heritage of society, including on the web. You can help us save the internet, too, as a citizen archivist.

Disappearing act

People and organizations remove content from the web for a variety of reasons. Sometimes it’s a result of changing internet culture, such as the recent shutdown of Yahoo Answers.

It can also be a result of following best practices for website design. When a website is updated, for example, the previous version is overwritten – unless it was archived.

Web archiving is the process of collecting, preserving and providing continued access to information on the internet. Often this work is done by librarians and archivists, with assistance from automated technology like web crawlers.

Web crawlers are programs that index web pages to make them available through search engines, or for long-term preservation. The Internet Archive, a nonprofit organization, uses thousands of computer servers to save multiple digital copies of these pages, requiring over 70 petabytes of data. It is funded through donations, grants and payments for its digitization services. Over 750 million web pages are captured per day in the Internet Archive’s Wayback Machine.

Why archive?

In 2018, President Donald Trump wrongly claimed via Twitter that Google had promoted on its homepage President Barack Obama’s State of the Union address, but not his own. Archived versions of the Google homepage proved that Google had, in fact, highlighted Trump’s State of the Union address in the same manner. Multiple news outlets use the Internet Archive’s Wayback Machine as the source for fact-checking these types of claims, since screenshots alone can be easily altered.

A 2019 report from the Tow Center for Digital Journalism examined the digital archiving practices and policies of newspapers, magazines and other news producers. The interviews revealed that many news media staff either do not have the resources to devote to archiving their work or misunderstand digital archiving by equating it to having a backup version.

When a news story disappeared from the Gawker website a year after the publication shut down, the Freedom of the Press Foundation became concerned with what might happen when wealthy individuals purchase websites with the intent to delete or censor the archives. It partnered with the Internet Archive to launch a web archive collection focused on preserving the web archives of vulnerable news outlets – and to dissuade billionaires from purchasing such material to censor.

A webpage from the Wayback Machine showing 9971 available search results for 'Black Lives Matter' between October 8, 2014, and August 2, 2021. — The web crawls for blacklivesmatter.com in the Internet Archive’s Wayback Machine.
Internet Archive Wayback Machine

Archiving websites that document social justice issues, such as Black Lives Matter, helps explain these movements to people of the present and the future.

Archiving government websites promotes transparency and accountability. Especially during times of transition, government websites are vulnerable to deletion with changing political parties.

In 2017 the Library of Congress announced it would no longer archive every single tweet, because of Twitter’s growth as a communication tool. Twitter supplies the Library of Congress with the texts of tweets, not shared images or videos. Instead of comprehensive collecting, the Library of Congress now archives only tweets of significant national importance.

A pastel colored early home page that reads 'Welcome to the OFFICIAL website of: ty' — The Internet Archive turns 25 years old help save it

Archived websites that document the culture and history of the internet, like the Geocities Gallery, not only are fun to look at but illustrate the ways early websites were created and used by individuals.

Citizen archivists

Archiving the internet is a monumental task, one that librarians and archivists cannot do alone. Anyone can be a citizen archivist and preserve history through the Internet Archive’s Wayback Machine. The “Save Page Now” feature allows anyone to freely archive a single, public website page. Bear in mind, some websites prevent web crawling and archiving through special coding or by requiring a login to the site. This may be due to sensitive content or the personal preference of the web developer.

Local cultural heritage institutions, such as libraries, archives and museums, are also actively archiving the internet. Over 800 institutions use Archive-It, a tool from the Internet Archive, to create archived web collections. At the University of Dayton we curate collections related to our Catholic and Marianist heritage, from Catholic blogs to stories of the Virgin Mary in the news.

Through its Spontaneous Event collections, Archive-It partners with organizations and individuals to create collections of “web content related to a specific event, capturing at risk content during times of crisis.”

Similarly, it created the Community Webs program, in partnership with the Institute of Museum and Library Services, to help public libraries create collections of archived web content relevant to local communities.

The websites of today are the historical evidence of tomorrow, but only if they are archived. If they are lost, we will lose crucial information about corporate and government decisions, modern communication methods such as social media, and social movements with significant online presences, such as Black Lives Matter and #MeToo.

Together with librarians and archivists, you can help ensure the survival of this evidence and save internet history.

Kayla Harris, Librarian/Archivist at the Marian Library, Associate Professor, University of Dayton; Christina Beis, Director of Collections Strategies & Services, Associate Professor, University Libraries, University of Dayton, and Stephanie Shreffler, Collections Librarian/Archivist and Associate Professor, University Libraries, University of Dayton

This article is republished from The Conversation under a Creative Commons license. Read the original article.

The Internet Archive turns 25 years old help save it!

Farmers markets are growing their role as essential sources of healthy food for rich and poor

One thought on “The Internet Archive turns 25 years old help save it”

Share This

Read Time:3 Minute, 58 Second

Facial biometrics: how smartphones can recognize us

How smartphones can recognize us

geralt/Pixabay

Mohamed Daoudi, IMT Lille Douai – Institut Mines-Télécom
Welcome to the new era: that of facial biometrics. The launch of the iPhone X, a smartphone featuring Face ID facial recognition, demonstrated that this technology has now reached full maturity. This became possible with the introduction of miniature 3D sensors with high-level computing power, combined with extremely efficient learning algorithms such as deep learning.
But what is facial recognition? It means identifying that two faces are identical despite changes caused by lighting conditions, pose and facial expressions. Generally speaking, this means finding distances within the face that can be used to identify any changes to the face.

Figure showing the same face in different shooting conditions and lighting changes.

In 2014, researchers from Facebook published an article called “DeepFace: Closing the Gap to Human-Level Performance in Face Verification”. To prevent the problems caused by changes in pose, a step was introduced to align the 2D face to a 3D model of the face. The next step involved a deep learning process using a network of artificial neurons consisting of 120 million connections. The learning set was composed of 4.4 million faces of celebrities. The network of neurons was trained to recognize the variances in the faces. The algorithm made it possible to determine if two photographed faces belonged to the same person with a specified accuracy of 97.35%.
In 2015, researchers from Google published an article entitled “FaceNet: A Unified Embedding for Face Recognition and Clustering”. They showed that they were able to achieve a recognition rate of 99.63% using a database of 2D faces captured in an uncontrolled environment. To accomplish this, the authors proposed the use of a neural network consisting of eleven convolutional layers and three connected layers. The idea was to ensure that an image of a specific person would be closer to all the other images of that same person (referred to as positive) than to the images of other people (referred to as negative). The learning was carried out using a database of 200 million face images from 8 million people.

During the training, the learned similarities allowed the images showing the same faces to come closer together, and those showing different faces moved farther apart in relation to a specific metric.

However, the DeepFace and FaceNet experiments were both based on private databases that are not available to the scientific community. A team from the University of Oxford proposed to collect data from the web and has established a database of 2.6 million faces from 2,622 people and has proposed a network architecture called VGG-face consisting of 16 convolutional layers and 3 fully connected layers. Today this architecture is widely used by the computer vision community.
Yet the face is not only a 2D image; it is also a three-dimensional image. Facial biometrics can be used because 3D scanning technologies can scan faces. The major advantage of using 3D in this context is that the facial recognition algorithms are resistant to changes in lighting and pose. Recent work published in 2013 by our team at IMT Lille Douai in the journal IEEE TPAMI, “3D face recognition under expressions, occlusions, and pose variations” showed the advantage of this process. In this article, we proposed to compare two 3D faces by comparing two sets of curves that locally represent the shape of a 3D face. We obtained a recognition rate of 97% (using the testing framework Face Recognition Grand Challenge). The results obtained from several international tests reveal the advantages of 3D faces in facial biometrics systems.

How smartphones can recognize us

Example of 3D faces captured by the Minolta scanner using laser technology.

Now let us get back to the iPhone X and its 3D technology for facial recognition. A feat made possible by the introduction of miniature 3D sensors on the front of the device: a projector sends 30,000 invisible points onto the user’s face, which are used to create a 3D model of the face. According to Apple, Face ID cannot be fooled by a mere photograph of a face, since the recognition is achieved with a 3D sensor that measures depth.

The original French version of this article was translated to English by the Institut Mines-Télécom.
Mohamed Daoudi, Professeur à l’IMT Lille Douai, Centre de recherche en informatique, signal et automatique de Lille, IMT Lille Douai – Institut Mines-Télécom
This article is republished from The Conversation under a Creative Commons license. Read the original article.

The Internet Archive turns 25 years old help save it

The Internet Archive has been fighting for 25 years to keep what’s on the web from disappearing – and you can help

Disappearing act

Why archive?

Citizen archivists

More to Explore

One thought on “The Internet Archive turns 25 years old help save it”

Leave a Reply Cancel reply