1

Using grepin Ubuntu, I am trying to regex match a pattern that is repeated multiple times in a line.

Example: 0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,

The regex I tried is -

([0-9]+:[0-9]+, )+

But it only matches upto -

0:0, 80:3, 443:0, 8883:0, 9000:0,

I would want it to match the complete line. Also, I'd appreciate if the regex will check if there is a presence of 80 and 443 in the matched string.

Expectation -

The following lines should be matched -

0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,

and the ones below should not be matched -

0:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 8883:0, 9000:0, 9001:0,
0:0, 8883:0, 9000:0, 9001:0,
2
  • Please add your desired output for that sample input to your question (no comment). Commented Feb 2, 2022 at 19:21
  • It's not matching the last term because there's no space on the end of your input string, but your regex requires one. If the spaces are optional, you could just use " *" instead of " " in the regex. If not, then you need to match "either space or end of line", which would be "( |$)" instead of the " ", or, if you're going to examine the match results and don't want to capture the spaces, you can use a non-capturing group, "(?: |$)". Commented Feb 2, 2022 at 19:45

2 Answers 2

2

You can use

^[0-9]+:[0-9]+, 80:[0-9]+, 443:[0-9]+(, [0-9]+:[0-9]+)+,$

See the regex demo.

Also, consider the awk solution like

awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' file

See the online demo:

#!/bin/bash
s='0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 8883:0, 9000:0, 9001:0,
0:0, 8883:0, 9000:0, 9001:0,'
awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' <<< "$s"

Output:

0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Wiktor - Can the regex be changed so that the lines with 80:3, 443:0, can be matched irrespective of their positions? Eg. Both 0:0, 80:0, 443:1, 8883:0, and 0:0, 443:1, 80:0, 8883:0, are matched.
@Ira This is what my awk does, see awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' file
2

Here is more robust awk pattern match, which is as per your shown samples, written and tested in GNU awk, should work in any awk. Simple explanation of awk code would be: awk works on method of condition/regexp then action, so I am mentioning condition/regexp here with NO action so if regexp is TRUE(matched) then by default printing of line will happen.

awk '/^0:[0-9],[[:space:]]+80:[0-9],[[:space:]]+443:[0-9],[[:space:]]+8883:[0-9](,[[:space:]]+9[0-9]{3}:[0-9]){2},$/' Input_file

Explanation: Adding detailed explanation for above regex.

^0:[0-9],[[:space:]]+             ##From starting of line matching 0 followed by colon followed by comma, followed y 0 OR 1 occurrences of space(s).
80:[0-9],[[:space:]]+             ##Above regex is followed by 80 colon any digit comma and space(s).
443:[0-9],[[:space:]]+            ##Above is followed by 443 colon digit comma and space(s).
8883:[0-9]                        ##Above is followed by 8883 followed by colon followed by any digit.
(,[[:space:]]+9[0-9]{3}:[0-9]){2} ##matching comma space(s) followed by 9 which is followed by 3 digits and this whole match 2 times(to match last 2 9000 etc values).
,$                                ##Matching comma at the end of line here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.