Librarians of the Twitterverse by James Gleick | NYRblog | The New York Review of Books

by Gilbert Keith

Librarians of the Twitterverse

James Gleick

For a brief time in the 1850s the telegraph companies of England and the United States thought that they could (and should) preserve every message that passed through their wires. Millions of telegrams—in fireproof safes. Imagine the possibilities for history!

[…]

Remind you of anything?

Here in the twenty-first century, the Library of Congress is now stockpiling the entire Twitterverse, or Tweetosphere, or whatever we’ll end up calling it—anyway, the corpus of all public tweets. There are a lot. The library embarked on this project in April 2010, when Jack Dorsey’s microblogging service was four years old, and four years of tweeting had produced 21 billion messages. Since then Twitter has grown, as these things do, and 21 billion tweets represents not much more than a month’s worth. As of December, the library had received 170 billion—each one a 140-character capsule garbed in metadata with the who-when-where.

[…]

The Library of Congress dreams of being able to provide scholars instant results for all kinds of queries—“to be able to answer any question a researcher puts before the archives,” as Dizard says—but that may be a long way off. Right now, to run a single query can take days. The Gnip company, as Twitter’s collaborator, offers a form of historical search for its clients, but it, too, is slow and specialized. “I think there is broad recognition already that there is enormous value that can be derived from the data,” says Gnip’s president, Chris Moody. “That being said, we have to be realistic in terms of what’s going to be available because it is very expensive and it is very challenging.”

via Librarians of the Twitterverse by James Gleick | NYRblog | The New York Review of Books.

You’re telling me there aren’t a million young people out there wanting to figure out how to do what the LoC wants to do? Why just worry about Google doing the indexing or whatever? And to think of it, this article was published at the heels of Facebook announcing GraphSearch, which effectively makes a gigantic dataset with similar attributes.

 

Advertisements