python regex command to extract data excluding comment line

Question

I need to extract data in a data file beginning with the letter "U" or "L" and exclude comment lines beginning with character "/" .

Example:

/data file FLG.dat
UAB-AB      LRD1503     / reminder latches

I used a regex pattern in the python program which results in only capturing the comment lines. I'm only getting comment lines but not the identity beginning with character.

if file_path != "": #pattern to search comment lines in the text file #pattern = "[^A-Za-z0-9-]/.+" data = read_file(file_path) find_str = re.findall(pattern , data) for x in find_str: print(x) else: print("no file selected") sys.exit() — shanmuga
– shanmuga, Commented Aug 31, 2019 at 19:29
Please add your code into the question and make sure it's well-formatted. — 41686d6564
– 41686d6564, Commented Aug 31, 2019 at 19:30

Olvin Roght · Accepted Answer · 2019-08-31 22:57:25Z

1

You can use ^([UL].+?)(?:/.*|)$. Code:

import re

s = """/data file FLG.dat
UAB-AB      LRD1503     / reminder latches
LAB-AB      LRD1503     / reminder latches
SAB-AB      LRD1503     / reminder latches"""
lines = re.findall(r"^([UL].+?)(?:/.*|)$", s, re.MULTILINE)

If you want to delete spaces at the end of string you can use list comprehension with same regular expression:

lines = [match.group(1).strip() for match in re.finditer(r"^([UL].+)/.*$", s, re.MULTILINE)]

OR you can edit regular expression to not include spaces before slash ^([UL].+?)(?:\s*/.*|)$:

lines = re.findall(r"^([UL].+?)(?:\s*/.*|)$", s, re.MULTILINE)

edited Aug 31, 2019 at 22:57

answered Aug 31, 2019 at 19:37

Olvin Roght

7,8432 gold badges19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Booboo Over a year ago

^([UL].+)/.*$ doesn't look right. First, try it against ' 'Uabc/xyz/def'. Second, it only matches lines with comments.

Olvin Roght Over a year ago

@RonaldAaronson, I agree with second, it's a logical mistake and it's fixed. About second, I prefer to use logic of comments in code - first occurrence of comment marker starts comment block.

Florian Braun · Accepted Answer · 2019-09-01 09:33:18Z

1

In case the comments in your data lines are optional here's a regular expression that covers both types, lines with or without a comment.

The regular expression for that is R"^([UL][^/]*)" (edited, original RE was R"^([UL][^/]*)(/.*)?$") The first group is the data you want to extract, the 2nd (optional group) would catch the comment if any.

This example code prints only the 2 valid data lines.

import re

lines=["/data file FLG.dat",
       "UAB-AB      LRD1503     / reminder latches",
       "UAB-AC      LRD1600",
       "MAB-AD      LRD1700     / does not start with U or L"
       ]

datare=re.compile(R"^([UL][^/]*)")

matches = ( match.group(1).strip() for match in ( datare.match(line) for line in lines) if match)

for match in matches:
    print(match)

Note how match.group(1).strip() extracts the first group of your RE and strip() removes any trailing spaces in your match

Also note that you can replace lines in this example with a file handle and it would work the same way

If the matches = line looks too complicated, it's an efficient way for writing this:

for line in lines:
    match = datare.match(line)
    if match:
        print(match.group(1).strip())

edited Sep 1, 2019 at 9:33

answered Aug 31, 2019 at 20:26

Florian Braun

3311 silver badge6 bronze badges

3 Comments

Booboo Over a year ago

And what does the (/.*)?$ portion of your regex contribute to the final result unless you wanted to know what the comment was?

Booboo Over a year ago

I wasn't clear: (/.*)?$ isn't necessary unless you want to know if there is a comment and what it is.

Florian Braun Over a year ago

Yes, you are correct. The 2nd match group is not needed (unless you need to know whether there is a comment or what it is). I updated my answer and the example still works.

Collectives™ on Stack Overflow

python regex command to extract data excluding comment line

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related