top of page

LITIGATION SUPPORT TIP OF THE NIGHT

Featured on the ACEDS blog.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer. All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information. This policy is subject to change at any time. The owner is not an attorney, and nothing posted on this site should be construed as legal advice. Litigation Support Tip of the Night does not provide confirmation that any e-discovery technique or conduct is compliant with legal, regulatory, contractual or ethical requirements.

See my post on Running Regex Searches With a Grep Utility on the ILTA litigation support blog.

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

Nov 18, 2021

Term frequency/ inverse document frequency

Term frequency / inverse document frequency is a way of measuring the relevancy of a word in a document based on its frequency in the full set it belongs to. Words that appear very often in many documents, will be ranked lower than those which appear a lot in just one document.

A calculation for td-idf is done by multiplying (the number of times a word appears in a document divided by the total word count) by (the logarithm [with a base of 10] of (the total document count divided by the number of documents that contain the word)). As we recall from math class in high school, the logarithm of a number is found by determining the exponent for the base which will result in a set number. So, 3 will be the logarithm of 1000 with a base of 10. (10 x 10 x 10 =1000).

So if we have a document which contains a term 20 times, and has 1000 total words, and the document is part of a set of 10,000 documents, 50 of which contain this term, the tf / idf of the document will be 0.046. If 500 documents in the full set contain this term, the tf / idf will be 0.026. If the document with 1000 words, contains the term 200 times and is from the full set of 10,000 documents, 50 of which contain this term, the tf /idf will be 0.460.

The formula can be written like this:

To get the logarithm, first divide the total document set count by the number of documents with the key term; press equals; and then press the key on the scientific version of your calculator which is labeled log10.

4 Kommentare

WKDU TRBD

06. Jan.

代发外链提权重点击找我;

谷歌蜘蛛池谷歌蜘蛛池;

Fortune Tiger…

Fortune Tiger…

谷歌权重提升/ 谷歌权重提升;

谷歌seo 谷歌seo;

谷歌霸屏谷歌霸屏

蜘蛛池蜘蛛池

谷歌快排谷歌快排

Google外链 Google外链

谷歌留痕谷歌留痕

谷歌霸屏谷歌霸屏

负面删除负面删除

币圈推广币圈推广

Google权重提升 Google权重提升

Google外链 Google外链

google留痕 google留痕

Gefällt mir

BFVY IRTO

28. Dez. 2024

代发外链提权重点击找我;

游戏推广游戏推广;

Fortune Tiger Fortune Tiger;

Fortune Tiger Slots Fortune…

谷歌马甲包/ 谷歌马甲包;

谷歌霸屏谷歌霸屏;

מכונות ETPU מכונות ETPU;

；ماكينات اي تي بي…

آلات إي بي بي…

ETPU maşınları ETPU maşınları；

ETPUマシン ETPUマシン；

ETPU 기계 ETPU 기계；

Gefällt mir

WKDU TRBD

28. Dez. 2024

代发外链提权重点击找我;

谷歌蜘蛛池谷歌蜘蛛池;

Fortune Tiger Fortune Tiger;

Fortune Tiger Slots Fortune…

谷歌权重提升/ 谷歌权重提升;

谷歌seo 谷歌seo;

מכונות ETPU מכונות ETPU;

Машини ETPU Машини ETPU

ETPU-Maschinen ETPU-Maschinen

EPS-машины EPS-машины

ЭПП-машины ЭПП-машины� بي يو

ETPU maşınları ETPU maşınları

ETPUマシン ETPUマシン

ETPU 기계 ETPU 기계

Gefällt mir

AVXJ KAZD

26. Dez. 2024

代发外链提权重点击找我;

google留痕 google留痕;

Fortune Tiger Fortune Tiger;

Fortune Tiger Fortune Tiger;

Fortune Tiger Slots Fortune…

站群/ 站群;

万事达U卡办理万事达U卡办理;

VISA银联U卡办理 VISA银联U卡办理;

U卡办理 U卡办理;

万事达U卡办理万事达U卡办理;

VISA银联U卡办理 VISA银联U卡办理;

U卡办理 U卡办理;

온라인 슬롯 온라인 슬롯;

온라인카지노 온라인카지노;

바카라사이트 바카라사이트;

EPS Machine EPS Machine;

EPS Machine EPS Machine;

EPS Machine EPS Machine;

EPS Machine EPS Machine;

Gefällt mir

bottom of page