top of page

Removing duplicate values in NotePad ++


If you're working with a sufficiently large amount of data in Excel you may find that basic options like 'Remove Duplicates' won't function . . . or will at least work very slowly. If you need to de-dupe a column of data, (perhaps just to look for irregularities in standardized data, when the filter list won't show all of the entries), you can paste it in NotePad ++ and dedupe it.

1. Paste the data in NotePad++. You can sort the data, clicking CTRL + A, and by going to Edit . . . Line Operations . . . Sort Lines Lexicographically Ascending, but this step is not necessary for the data to be deduplicated.

2. Press CTRL + H and enter this Regex search in the Find box:

^(.*?)$\s+?^(?=.*^\1$)

. . . leave the replace box empty and make sure the Regex search mode is selected.

3. Click replace all and you'll quickly have a deduped list.

See the explanation of this Regex search on this web page: http://stackoverflow.com/questions/3958350/removing-duplicate-rows-in-notepad


bottom of page