Noise Word Lists
In its Analytics Guide, Relativity recommends a noise word list posted to a Dutch site Ranks. Ranks has noise word lists available in several different languages, including Japanese, Chinese, French and German.
There are actually several noise word lists for English.
1. A long list of 668 words. It includes less obvious words such as accordance; beginning; looking; predominantly; and usefulness, as well as each single letter in the alphabet.
2. A short list of 174 words.
3. The MySQL stop word list of 543 words that does not include single letter words.
Surprisingly the MySQL and Ranks long list noise word lists are quite different. There are 206 words in the long Ranks list which are not in the MySQL list, and 82 words in the MySQL list omitted from the Ranks long list. As you can see from the lists below, the MySQL list tends to include more contractions, and the Ranks Long List has more common adverbsand misspellings or grammatical mistakes.
Ranks Long List Not in MySQL
a abst accordance act added adj affected affecting affects ah announce anymore apparently approximately aren arent arise auth b back begin beginning beginnings begins biol briefly c ca couldnt d date due e ed effect eighty end ending et-al f ff fix found g gave give giving h hed heres hes hid home hundred i id im immediately importance important index information invention itd j k kg km l largely lets line 'll m made make makes means meantime mg million miss ml mr mrs mug n na nay necessarily ninety nonetheless nos noted o obtain obtained omitted ord owing p page pages part past poorly possibly potentially pp predominantly present previously primarily promptly proud put q quickly r ran readily recent recently ref refs related research resulted resulting results run s sec section shed she'll shes show showed shown showns shows significant significantly similar similarly slightly somethan specifically stop strongly substantially successfully sufficiently suggest t
taking that'll that've thered there'll thereof therere thereto there've theyd theyre thou thoughh thousand throug til tip ts u unlike ups usefully usefulness v 've vol vols w wasnt wed werent what'll whats wheres whim whod who'll whomever whos widely wont words world wouldnt www x y youd youre z
MySQL not Ranks Long List
a's ain't allow allows apart appear appreciate appropriate aren't associated best better c'mon c's cant changes clearly concerning consequently consider considering corresponding couldn't course currently definitely described despite entirely exactly example going greetings hadn't he's hello help here's hopefully i'd i'm ignored inasmuch indicate indicated indicates inner insofar it'd it's let's novel presumably reasonably second secondly sensible serious seriously t's that's there's they'd they're third thorough thoroughly three wasn't we'd we're well weren't what's where's who's will won't wonder wouldn't you'd you're