Information Overload To Swamp The National Archives

As we prepare for the transition from the Bush administration to the incoming Obama administration, the National Archives, charged with preserving and indexing presidential records, has launched an emergency plan to deal with the looming deluge of electronic records. The incoming data will be comprised of around 100 terabytes of content, which is 50 times the amount the Clinton administration left behind.

Much of the content is in the form of e-mail, which has exploded in use over the last eight years, and is expected to make up to 20-40 terabytes. This number can only be expected to grow in future administrations. There is no doubt that Obama and his team, judging by his admitted addiction to his BlackBerry, will generate even more data that will have to be archived and indexed.

The effort to incorporate the records has been complicated by foot dragging by the current administration; they have been reluctant to provide details about the size and format of the records they will be turning over. The format issue is particularly relevant, many of the records are expected to be in unfamiliar formats, and without preparation the National Archive will struggle to process them. Specifically, there are large quantities of digital photographs and the White House’s “records management system” that provides the index for the text-based records.

In addition to the size of the records, there are concerns that the proprietary format of the records will become obsolete. This has already happened at NASA, there are millions of files on 8″ and 5 1/4″ floppy diskettes, various obsolete tape cartridges, and NASA’s earliest photographs of the earth that are mostly inaccessible with today’s technology.

Despite a new $144 million computer system, concerns abound about the ability of the National Archives to absorb the content. In the words of Paul Brachfeld, the archives’ inspector general, “Just because you ingest the data does not mean that people can locate, identify, recover and use the records they need.”

While only a fraction of the information turned over to the National Archives will be of any interest to researchers and the public, by law it must all be archived, and without a plan in place to index and store the data, it may be rendered utterly useless. As the amount of electronic content increases, this may prove to be only the tip of the iceberg.

Cody Burke is a senior analyst at Basex.

Comments are closed.