Regex query: how can I search PDFs for a phrase where words in that phrase appear on more than one l
- by Alison
I am trying to set up an index page for the weekly magazine I work on. It is to show readers the names of
companies mentioned in that weeks' issue, plus the page numbers they are appear on.
I want to search all the PDF files for the week, where one PDF = one magazine page (originally made in
Adobe InDesign CS3 and Adobe InCopy CS3).
I have set up a list of companies I want to search for and, using PowerGREP and using delimited regular
expressions, I am able to find most page numbers where a company is mentioned. However, where a
company name contains two or more words, the search I am running will not pick up instances where the
name appears over more than one line.
For example, when looking for "CB Richard Ellis" and "Cushman & Wakefield", I got no result when the
text appeared like this:
DTZ beat BNP PRE, CB [line break here]
Richard Ellis and Cushman & [line break here]
Wakefield to secure the contract. [line end here]
Could someone advise me on how to write a regular expression that will ignore white space between
words and ignore line endings OR one that will look for the words including all types of white space (ie uneven
spaces between words; spaces at the end of lines or line endings; and tabs (I am guessing that this info is
imbedded somehow in PDF files).
Here is a sample of the set of terms I have asked PowerGREP to search for:
\bCB Richard Ellis\b
\bCB Richard Ellis Hotels\b
\bCentaur Services\b
\bChapman Herbert\b
\bCharities Property Fund\b
\bChetwoods Architects\b
\bChurch Commissioners\b
\bClive Emson\b
\bClothworkers’ Company\b
\bColliers CRE\b
\bCombined English Stores Group\b
\bCommercial Estates Group\b
\bConnells\b
\bCooke & Powell\b
\bCordea Savills\b
\bCrown Estate\b
\bCushman & Wakefield\b
\bCWM Retail Property Advisors\b
[Note that there is a delimited hard return between each \b at the end of each phrase and beginnong of the next phrase.]
By the way, I am a production journalist and not usually involved in finding IT-type solutions and am
finding it difficult to get to grips with the technical language on the PowerGREP site.
Thanks for assistance
Alison