1

So I have incoming data that looks something like this:

Applications                    7 days          6 days

And I'm trying to create regex that will match this line but not a line that has another column, like this:

Applications                    7 days          6 days        5 days

The regex that I'm trying to use is:

^(.*?)(\s){4,}(.*?)(\s){4,}[^(\s){2}]+

Where [^(\s){2}]+ would mean selecting everything up to a double space. The problem with this is that

  1. it doesn't work to begin with.
  2. the second line I have would still match this.

Is there any regex I can use to only match the 3 column table and not the 4 column, 5 column, etc.?

2
  • I would go about it differently and just test by splitting each line on \s{2,} then checking that the length of the array is equal to 3. Commented May 16, 2014 at 16:40
  • is this space or tab-delimited string? Commented May 16, 2014 at 16:40

2 Answers 2

2

You should take care with character classes ([]) as some chars inside are treated literally (as if they were escaped).

Try this regex (demo here):

^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s))+)$
  • I switched the (.*?) with ((?:(?!\s\s).)+) which will match everything up to a sequence of two spaces.
  • I added a $ at the end, so it wouldn't match the lines with more than two columns.
  • I also added some ?: so the groups would become non-matching groups.
  • Finally, I removed the character class from the end of the regex and added a negative look-ahead.

Columns not ending with spaces

This one will not accept lines where the second column ends with spaces (demo here):

^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s)(?!\s$))+)$

Notice the addition of a second negative look-ahead in the last group: (?!\s$).

Sign up to request clarification or add additional context in comments.

5 Comments

The only problem with that is that if there is more than 2 spaces after the end of the line it will still match it. Is there anyway for that to look ahead for 2 OR MORE spaces? Would it be as simple as adding a quantifier in the negative lookahead at the end?
I added another negative-lookahead so it won't match lines ending with spaces.
This is awesome! It works very well! May I ask some questions on it? What does the ?: do in regex?
Sure. The ?: is used to make the group a non-capturing group. For instance, if you have a regex like (.*?)(a){2}(.*?)(b){2}, when you "run" it in a string like 123aa789bb the matched groups (important in replacing) will be: 1-123, 2-aa, 3-789 and 4- bb. Now, maybe you are just interested in the (.*?) parts in the matched groups. In that case, you use ?: in the ones you don't want, like: (.*?)(?:a){2}(.*?)(?:b){2}. This way, the matched groups in that sample string would be 1-123 and 2-789.
In other words, ?: has no effect in the regex itself, just in what it will consider to be matched groups. In your case, as you probably didn't want to reference the (\s){4,}, I used (?:\s){4,}. In the end, if what the matched groups are doens't matter to you, you can ditch ?: and let your regex be a little more readable.
0

try this :

^[^\s]*(\s{2,}[^\s].*){2,}

assuming before each column-value there is at least 2 spaces.

DEMO

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.