top of page

Searching for Non-ASCII Text in Text Files

Here's a supplement to the Tip of the Night for January 22, 2022 which discussed how to format a text file correctly so that it will load without errors in Lexis TextMap. When preparing a text file to load in a deposition transcript review application, be sure to remove any text which is not ASCII text, the general encoding standard widely used for transcripts.


Commonly used characters such as:

  1. A dash – [which can be replaced with a hyphen - ]

  2. Curley quotes “ [which can be replaced with straight quotes "]

  3. Smart apostrophes ‘ [which can be replaced with a straight apostrophe ' ]

. . . are not ASCII text. When a platform like TextMap loads a file with these characters, they will not be converted correctly and result in garbled text such as;

Let’s

. . . instead of:

Let's


So if you see an em dash in a text editor like this:



. . . in TextMap it will display like this:



You can find non-ASCII characters in NotePad ++ by going to Search . . . Find characters in range



A dialog box will open that will give you the option to search for non-ASCII characters.



. . . this will allow you to jump to each non-ASCII character in the text file one by one.


If you're not using NotePad++ you can run this regular expression search to find any non-ASCII characters.

[^\x00-\x7F]+






Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page