top of page

NSRL Unique File Corpus


NIST's National Software Reference Library makes available a Unique File Corpus of each file that it processes. The tab delimited file lists the SHA1 hash value and byte count of each file. It is available for download here.

If you're conducting electronic discovery and find unknown files and want to determine if they may contain data unique to a business's network consulting the corpus is one way to 'de-NIST' the data.

The full NSLR Reference Data Set contains far more information for each file, including the CRC32 checksum: the MD5 hash value; manufacturer, operating system, and product codes, names, and versions; language; application type; and the file name. Whereas the full set downloads as an .iso file, the Unique File Corpus downloads as a single file named, 'CorpIdMetadata.tab' in a zip file.

The tab format can be handled by MS Access. I was able to successfully import the first 1,048,576 rows into an Excel spreadsheet tonight. Take a look:


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page