TAR for Smart People - Outline - Chapter 6 and 7
Here's another installment in my outline of John Tredennick's 'TAR for Smart People'. On November 13, 2016, I posted an outline of chapters 4 and 5. This part of the outline covers Chapters 6 and 7.
6. Five Myths About Technology Assisted Review - How TAR 2.0 Overcomes the Limits of Earlier Systems
1. You Only Get One Bite At the Apple
With TAR 2.0 the reviewers that create the initial seed set are given the next most likely relevant documents for review. Their tags are fed back into the system, so it makes better decisions.
2. Subject Matter Experts are Required for TAR Training
TAR 2.0 systems take a variation in responsive decisions in account and present outliers to an expert for correction.
3. You Must Train on Randomly Selected Documents
Modern TAR systems allow you to submit as many documents as you like. Can do diversity sampling (docs you know the least about); systematic sampling (every nth document); and random sampling.
4. You Can't Start TAR Training Until You Have All of Your Documents
TAR 1.0 required that all documents be collected before training began. TAR 2.0 ranks all of the documents each time and don't use a control set to determine the effectiveness of the ranking.
5. TAR Doesn't Work for Non-English Documents
TAR is a mathematical process that ranks documents based on word frequency; doesn't matter which language. Chinese, Japanese and Korean must be broken into word segments - this is tokenizing. TAR 2.0 can only tokenize.
7. TAR 2.0: Continuous Ranking - Is One Bite at the Apple Really Enough?
Can We Reduce the Review Count Even Further?
TAR 2.0 - continuous ranking throughout the review process.
Catalyst uses contextual diversity sampling to select documents dissimilar from those you have already reviewed.
Research Study One - 85K total docs; 11K responsive; 13% richness. With TAR 1.0 have to review 60K documents to get 80% recall. With TAR 2.0 have to review 27K documents. For 95% recall, with TAR 1.0 would have to review 77K dcouments; with TAR 2.0 review 36K documents for 95% recall - a 49% savings.
Research Study Two: 57K total documents; 11K responsive. TAR1.0 has to review 29K docs to get 80% recall. With TAR 2.0 23K docs for 80% recall. For 95% recall, with TAR 1.0 would have to review 46K docs to reach 95% recall. With TAR 2.0 must review 31K documents for 95% recall - a 25% savings.
Research Study Three - Priviege Review: 85K docs; only 1K privileged. 2K documents must be reviewed with TAR 1.0 to get 80% recall. No gain from TAR 2.0 because the process would already be complete. 18K docs must be reviewed for 95% recall, but only 14K docs with 95% recall. Supplement privilege review with check of names and organizations in communications.
Subject Matter Experts - better to have expert review potion of documents tagged by review team than have her or him review all of the training documents.