top of page

Regex to find where consecutive lines end with the same text

Tonight's tip features a regular expression script created by The fourth bird of the Netherlands on stack overflow. I posted looking for a regular expression that would find the text on a line repeated from the prior line after a time code at the beginning of the both lines that might be different. See this example:


(11:12:21) [Tom]: Hello this is Tom. Who is it?

(11:14:08) [Tom]: Hello this is Tom. Who is it?


The goal was to find when consecutive lines were the same after the first 10 characters. The fourth bird came up with a solution that would find when parts of two lines matched. In a text editor like NotePad++ run this find and replace search:


FIND: ^(\([^][]*\))(.*)(?:\r?\n\([^][]*\)\2)+

REPLACE: $1$2


^(\([^][]*\)) will find the first part of the string - the time code in parentheses. So the caret ^ matches the beginning of the line, and the rest then finds the rest of the text between the parentheses.


(.*) matches to the end of the line after the parenthetical information at the beginning.


(?:\r?\n this then matches a new group on a new line


\([^][]*\) this matches from the first part of the previous line.


\2)+ this then matches with the second part of the previous line.



As you can see in this demonstration a find and replace in the text editor can easily remove the duplicate lines.









Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page