top of page

Formatting Text Files to Load Correctly

If you have to re-format a text file to load it correctly into Lexis TextMap, it will be important to get several things exactly right. If not, it's possible that you'll get an error message like this one:




The transcript may still be loaded if this message appears, but you'll have the line numbers added into the text itself, so they'll be selected when the text is. Searching through the transcript may also be adversely affected.


In order to conform a text file to the Amicus format used by TextMap, be sure to follow each of these steps:


1. Make sure that each page break is identified by a four digit number listed by itself on one line.



2. Each line number should be listed at least three spaces before the transcript text. There should be one space before the single digit line numbers.


3. No line should be more than 70 characters long. (I've seen documentation stating that each line should be no longer than 78 characters, but the actual limit appears to be smaller.) Note that you can run a regular expression search for lines longer than 70 characters using this form:


.{70,}


. . . which is for any character repeated 70 or more times. In the free text editor, NotePad++, the position on a line will be listed at the bottom as the 'column' in which the character appears.



4. Also confirm that there are no blank whitespaces at the end of each line, and that the last text on the line is followed by a carriage return and new line marker.



5. If you have any pages which are longer than 25 lines, have the line numbering run from 1 to 25 on rows 1 to 25, and then leave the last lines unnumbered.


6. Confirm that the transcript doesn't have any blank lines. Make sure that it doesn't end with a blank line!


7. If all else fails, try copying the text into an entirely new file in NotePad, and saving it in the format of a 'Normal text file'.



See also the Tip of the Night for January 29, 2022, which discusses how to remove non-ASCII text.



Comments


bottom of page