Study on Linear Review vs. TAR
In 2010 the American Society for Information Science and Technology published a study of efficacy of the 'machine categorization'. See, Herbert L. Roitblat, Anne Kershaw, and Patrick Oot, Document Categorization in Legal Electronic Discovery:Computer Classification vs. Manual Review, 61(1) J. Assoc. Inf. Sci. Technol. 70–80 (2010). The study compared the results of a document review performed by 225 attorneys in response to a Second Request from the Department of Justice under the Hart-Scott-Rodino Act (concerning an investigation of the acquisition of MCI by Verizon) with the results of automated categorization performed by two e-discovery vendors.
225 attorneys performed the initial review in two teams - one looking for privileged documents and the other looking for relevant documents. Two re-review groups (Teams A & B) both reviewed the same 5,000 documents in order to create seed sets for two different e-discovery vendors, one based in California and the other in Texas. The vendors operated without knowledge of the other's results. The data consisted of 2.3 million documents totaling 1.3 TB that were collected from 83 custodians. Only 1.6 million documents remained after duplicates were eliminated. The original team of attorneys took 4 months to identify 176,000 documents for production. The cost of the review was $13.5 million.
This table compares the coherence between the various studies. [Apparently there's a typo and the last row should refer to 'Original v. System D']. A Verizon attorney, Patrick Oot, one of the authors of the study, 'adjudicated' decisions made by the reviewers for the seed sets deciding which one was right when they disagreed. The confirmation by this subject matter expert lends credence to the finding that in the set of 5,000 the original team missed 739 relevant documents.
The results show that computer assisted review performed by the vendors (System C and System D) marked large numbers of documents as responsive which the original team of 225 attorneys found to be non-responsive - about 200K by both systems. They also found that about half of the documents produced after the original review were non-responsive.
There was more likely to be agreement between the teams on which documents were non-responsive than as to which were responsive.
The levels of precision and recall for the systems used by the vendors were relatively low, but the study concluded that the automated review systems yielded results at least as good as those of the human review team.