top of page

Lemmatization


When considering document review platforms and their conceptual searching capabilities, inquire as to whether or not they can account for the Lemmatization of words. The lemma of a word is its dictionary form. So the word ‘go’ is the lemma for ‘going’, ‘went’, ‘gone’ - the various tenses of the ‘headword’, ‘go’. The multiple inflections are collectively known as the lexeme of the word. Lemmatization differs from stemming in that it considers the context in which a word is used. Stemming will not find ‘better’ which is part of the lexeme of the lemma, ‘good’. Generally stemming facilitates the recall of a search - that percentage of available responsive hits in a review set that are returned. Employing search algorithms which account for Lemmatization will improve the precision of searches - the percentage of true hits as opposed to false positives. A stemming search algorithm may use a stem of the word, ‘crazy‘, spelled as ‘crazi’ to account for craziness.   


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page