Find string between two other strings within a series

Question

Sorry about the ambiguous title, as I couldn't think of how to articulate the question.

I have a CSV file that has hundreds of lines, with thousands of LDAP distinquished names. One of the sample lines could look like:

CN=John Doe,OU=Miami,DC=contoso,DC=com; CN=Spamela Anderson,OU=Los Angeles,DC=contoso,DC=com; CN=Cosmo Kramer,OU=Subfolder,OU=Subfolder,OU=ParentFolder,DC=FABRIKAM,DC=com; CN=Bob Barker,DC=contoso,DC=com
CN=Luke Skywalker,OU=Tattoine,DC=contoso,DC=com; CN=Brad Pitt,OU=Hollywood,DC=contoso,DC=com; CN=Mickey Mouse,OU=Users,DC=contoso,DC=com
CN=Ted Nugent,OU=Houston,DC=FABRIKAM,DC=com; CN=Carl Sagan,DC=Uranus,DC=contoso,DC=com

I'd like to remove any distinguished name that is in the FABRIKAM.COM domain (dc=fabrikam,dc=com). In the sample, I'd like to strip out:

;CN=Cosmo Kramer,OU=Subfolder,OU=Subfolder,OU=ParentFolder,DC=FABRIKAM,DC=com

I've tried to use:

CN=(.*)?,DC=fabrikam,DC=com

But this finds the first occurrence of "CN=" from the beginning of the line until an occurrence of "DC=fabrikam,dc=com" (which would also include John Doe and Spamela Anderson, in my sample).

Is there a way to find the first occurrence of "CN=" to the left of "DC=fabrikam,DC=com" as the boundary?

(I use either Notepad++ or Programmer's Notepad)

nhahtdh · Accepted Answer · 2013-02-03 08:52:15Z

1

If you can assume that ; never appears in the values and is only used for delimiting different records, then you can use this:

CN=[^;]*,DC=fabrikam,DC=com

Note that the regex above may grab the match from multiple lines.

This is a quick fix, if the file uses \n to separate the lines:

CN=[^;\n]*,DC=fabrikam,DC=com

edited Feb 3, 2013 at 8:52

answered Feb 3, 2013 at 8:43

nhahtdh

56.9k15 gold badges131 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

BHall Over a year ago

Thanks, it works! One weirdness is that if the fabrikam DN occurs at the beginning of a line, the search also grabs the last occurrence of "CN=" from the previous line. Not a big deal as I can hopefully manually edit those. But for my own knowledge, is there a way to prevent the search from crossing newlines or carriage returns?

nhahtdh Over a year ago

@BHall: That is one case that I expect this to fail. Can you edit your question to include more test data?

BHall Over a year ago

Original question updated, but I see that you've already fixed it! Thanks for your help, @nhahtdh!

nhahtdh Over a year ago

@BHall: Note that the fix may not work if your file uses something other than \n to separate the lines. We normally use \n (Windows or Linux, Windows uses \r\n, but there is still \n in there), but there are other Unicode characters which are also used for separating lines.

Collectives™ on Stack Overflow

Find string between two other strings within a series

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related