top of page

Predicting the number of top level clusters


In a recent Relativity webinar, it was contended that when performing clustering on a saved search, a generality setting of 0.5 creates 8 top-level clusters in the example data set. The instructor noted that this setting is not guaranteed to generate 8 on all data sets. Generality at 0.9 creates 4 top-level clusters in the same data set.

In a very general way, this seems to borne out in my own Relativity sandbox workspace. Clustering a saved search at 0.5 generality . . .

. . . doesn't create 8 top level clusters in my data set, but it does create six large top level clusters, plus several other top clusters which might be grouped together to form two additional top level clusters of similar size.

A generality setting of .9 . . . .

. .. . won't create 4 top level clusters in my completely different data set:

. . . but it does create four top level clusters clearly larger than the others.

It's a general rule of thumb but not an entirely unuseful one, however my test clusters seem to refute the general rule that high generality settings will lead to fewer top-level clusters.


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page