Searching for Non-ASCII Text in Text Files
Here's a supplement to the Tip of the Night for January 22, 2022 which discussed how to format a text file correctly so that it will load without errors in Lexis TextMap. When preparing a text file to load in a deposition transcript review application, be sure to remove any text which is not ASCII text, the general encoding standard widely used for transcripts.
Commonly used characters such as:
A dash – [which can be replaced with a hyphen - ]
Curley quotes “ [which can be replaced with straight quotes "]
Smart apostrophes ‘ [which can be replaced with a straight apostrophe ' ]
. . . are not ASCII text. When a platform like TextMap loads a file with these characters, they will not be converted correctly and result in garbled text such as;
Let’s
. . . instead of:
Let's
So if you see an em dash in a text editor like this:
![](https://static.wixstatic.com/media/af7fa4_3a1c57c30df347d7bb1669d882857d19~mv2.png/v1/fill/w_980,h_349,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/af7fa4_3a1c57c30df347d7bb1669d882857d19~mv2.png)
. . . in TextMap it will display like this:
![](https://static.wixstatic.com/media/af7fa4_e01947f3d1484fcf8817c64123d78b4b~mv2.png/v1/fill/w_980,h_432,al_c,q_90,usm_0.66_1.00_0.01,enc_auto/af7fa4_e01947f3d1484fcf8817c64123d78b4b~mv2.png)
You can find non-ASCII characters in NotePad ++ by going to Search . . . Find characters in range
![](https://static.wixstatic.com/media/af7fa4_d93ee36358ef4e539801fdc28556bc51~mv2.png/v1/fill/w_369,h_483,al_c,q_85,enc_auto/af7fa4_d93ee36358ef4e539801fdc28556bc51~mv2.png)
A dialog box will open that will give you the option to search for non-ASCII characters.
![](https://static.wixstatic.com/media/af7fa4_f686b49e19b3443bbfe2d80304e3fbc8~mv2.png/v1/fill/w_383,h_239,al_c,q_85,enc_auto/af7fa4_f686b49e19b3443bbfe2d80304e3fbc8~mv2.png)
. . . this will allow you to jump to each non-ASCII character in the text file one by one.
![](https://static.wixstatic.com/media/af7fa4_6639313007c040a8833426ae20f34398~mv2.png/v1/fill/w_739,h_532,al_c,q_90,enc_auto/af7fa4_6639313007c040a8833426ae20f34398~mv2.png)
If you're not using NotePad++ you can run this regular expression search to find any non-ASCII characters.
[^\x00-\x7F]+
![](https://static.wixstatic.com/media/af7fa4_10eb311072e742feb02ca35cad79fc64~mv2.png/v1/fill/w_519,h_370,al_c,q_85,enc_auto/af7fa4_10eb311072e742feb02ca35cad79fc64~mv2.png)