Regex to find where consecutive lines end with the same text

Sean O'Shea
Jan 7, 2023
1 min read

Tonight's tip features a regular expression script created by The fourth bird of the Netherlands on stack overflow. I posted looking for a regular expression that would find the text on a line repeated from the prior line after a time code at the beginning of the both lines that might be different. See this example:

(11:12:21) [Tom]: Hello this is Tom. Who is it?

(11:14:08) [Tom]: Hello this is Tom. Who is it?

The goal was to find when consecutive lines were the same after the first 10 characters. The fourth bird came up with a solution that would find when parts of two lines matched. In a text editor like NotePad++ run this find and replace search:

FIND: ^($[^][]*$)(.*)(?:\r?\n$[^][]*$\2)+

REPLACE: $1$2

^($[^][]*$) will find the first part of the string - the time code in parentheses. So the caret ^ matches the beginning of the line, and the rest then finds the rest of the text between the parentheses.

(.*) matches to the end of the line after the parenthetical information at the beginning.

(?:\r?\n this then matches a new group on a new line

$[^][]*$ this matches from the first part of the previous line.

\2)+ this then matches with the second part of the previous line.

As you can see in this demonstration a find and replace in the text editor can easily remove the duplicate lines.

LITIGATION SUPPORT TIP OF THE NIGHT

New tips for paralegals and litigation support profesionals are posted to this site each week. Click on the blog headings for better detail.

See How-To Videos on my YouTube channel.

Regex to find where consecutive lines end with the same text