The National Security Agency (NSA) brought the topics of “Metadata” and “Metadata Mining” to center-stage this past month when whistleblower Edward Snowden revealed that the NSA operates a program collecting and organizing enormous bundles of metadata created by U.S. citizens. A national debate, questioning whether this program violates the constitutional guarantees of privacy, has begun. Other questions, regarding the boundaries placed on the government’s power to “spy” on its own population, are also at issue. At the crux of the debate, however, is the primary question: What is metadata, and what does it mean to “mine” metadata?
Metadata is often described as “the data about the data.” This description, however, although accurate is not particularly helpful to the uninitiated. In an attempt to provide a better understanding of metadata, without slipping hopelessly into geek-speak or oversimplified ambiguity, let’s use the example of an individual mailing a letter through a private carrier like UPS or FedEx. This example is very similar to sending an e-mail. First, an individual writes a note and places it into an envelope for shipment. To ensure that the package reaches the appropriate destination, identifying information must be placed on the envelope. The sender writes the name of the recipient, their address, postal code, the return address, and a description of the envelope’s contents. The carrier weighs the letter, charges the appropriate fee, places an identifying sticker on the envelope detailing the envelope’s, description, destination, necessary time of delivery, and an identifying tracking number to monitor its movement and delivery. All of this information placed on the envelope is metadata. None of this data reveals the communication written on the letter contained inside the envelope, but it does provide a great deal of information about the sender, the recipient, their relationship, and geographic locations.
Metadata is not limited to e-mail. It is also created when you make a phone call, cell phone call, take a digital photograph, send a fax, print a document, scan a document or create an electronic document. The various electronic devices that generate and receive these media files are acting like the mail carrier in the example above. They are creating and recording multiple fields of data, i.e., dates, locations, file names and other characteristics; all to make sure that the content is clearly identified, stored, sent, saved, managed, and easily retrieved.
Metadata Mining occurs whenever an organization or individual begins to collect and uncover this otherwise unseen information. If you have ever clicked on a Word document to check the date it was created, last modified, or checked the top of an e-mail that was forwarded to you, which lists the previous sender, recipient, and the sent/received dates and times, you are essentially mining the metadata.
The NSA, however, is not reviewing individual data fields, but is mining enormous chunks of randomly collected information. After collection, this data is processed through a sophisticated network of algorithms developed to identify distinct patterns of behavior that match the NSA’s criteria. Again, this information does not reveal the content of the actual communications, but it does assist the NSA in building a profile of the recipients and senders. The assumption is that the NSA catalogs this information to create links between suspected terrorists, their associate contacts and geographical locations.
Whether this seizure of the “data about the data” constitutes an unlawful violation of privacy, or if the American people simply will not support this type of domestic government surveillance will be a question addressed in the courts and in future elections.
Bio: Derek is currently blogging for a data recovery new jersey company.
Image Credit: Martin Gommel