Predicting the number of top level clusters

May 3, 2020

In a recent Relativity webinar, it was contended that when performing clustering on a saved search, a generality setting of 0.5 creates 8 top-level clusters in the example data set. The instructor noted that this setting is not guaranteed to generate 8 on all data sets. Generality at 0.9 creates 4 top-level clusters in the same data set.

In a very general way, this seems to borne out in my own Relativity sandbox workspace. Clustering a saved search at 0.5 generality . . .

. . . doesn't create 8 top level clusters in my data set, but it does create six large top level clusters, plus several other top clusters which might be grouped together to form two additional top level clusters of similar size.

A generality setting of .9 . . . .

. .. . won't create 4 top level clusters in my completely different data set:

. . . but it does create four top level clusters clearly larger than the others.

It's a general rule of thumb but not an entirely unuseful one, however my test clusters seem to refute the general rule that high generality settings will lead to fewer top-level clusters.

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

Predicting the number of top level clusters