Fidel from Brisbane has posted a very useful PowerShell script here, that you can use to run a regular expression search through a text file and extract only the matching hits.
When used with the RegEx search for Bates numbers discussed last night, it can be used to automatically extract a complete list of Bates numbers in any text file.
So if you start with a text file that looks like this:
. . . with Bates numbers at the end of each paragraph, you can run this PowerShell script:
select-string -Path C:\foofolder\input.txt -Pattern "(\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b|\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b)" -AllMatches | % { $_.Matches } | select-object Value -unique | sort-object Value > C:\foofolder\output2.txt
to pull out the Bates numbers. Note that you need to specify the path of your text file at:
select-string -Path C:\foofolder\input.txt
. . . put in the Regex in quotes at:
-Pattern "(\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b|\b\w{1,10}(-|_|\s?)\b\w{1,10}(-|_|\s?)[0-9]{5,12}\b)" -AllMatches
. . . and then specify an output file at the end:
sort-object Value > C:\foofolder\output2.txt
You should end up with a text file that just lists the Bates numbers and has them sorted as well!