42

I have the following input,

OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.

And I'd like to extract all of the input except the line containing "OK SYS 10 LEN 20" and the last line which contains a single "." (dot). That is, I want to extract the following

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt.1234 /data/c13af4/f.txt

I tried the following,

for item in output:
    match_obj = re.search("^(?!OK) | ^(?!\\.)", item)
    if match_obj :
        print("got item " + item)

but it does not work, as it does not produce any output.

3
  • Are the lines in the first codeblock lines of a file? Commented Aug 23, 2012 at 12:06
  • what's the \\. for? Commented Jan 3, 2020 at 9:22
  • @alannaC it's used to escape the special character . which stands for any character in many regex implementations. Commented May 10, 2022 at 15:37

8 Answers 8

66

See it in action:

match_obj = re.search("^(?!OK|\\.).*", item)

Don't forget to put .* after negative look-ahead, otherwise you couldn't get any match

Sign up to request clarification or add additional context in comments.

1 Comment

The negative look-ahead doesn't match anything, it just looks ahead before matching, and applies to the next pattern that follows it (eg. .*) Took me a while to figure that out.
9

Use a negative match. (Also note that whitespace is significant, by default, inside a regex so don't space things out. Alternatively, use re.VERBOSE.)

for item in output:
    match_obj = re.search("^(OK|\\.)", item)
    if not match_obj:
        print("got item " + item)

Comments

6
if not (line.startswith("OK ") or line.strip() == "."):
    print(line)

Comments

4

Why don't you match the OK SYS row and not return it.

for item in output:
    match_obj = re.search("(OK SYS|\\.).*", item)
    if not match_obj :
        print("got item " + item)

Comments

1

If this is a file, you can simply skip the first and last lines and read the rest with csv:

>>> s = """OK SYS 10 LEN 20 12 43
... 1233a.fdads.txt,23 /data/a11134/a.txt
... 3232b.ddsss.txt,32 /data/d13f11/b.txt
... 3452d.dsasa.txt,1234 /data/c13af4/f.txt
... ."""
>>> stream = StringIO.StringIO(s)
>>> rows = [row for row in csv.reader(stream,delimiter=',') if len(row) == 2]
>>> rows
[['1233a.fdads.txt', '23 /data/a11134/a.txt'], ['3232b.ddsss.txt', '32 /data/d13f11/b.txt'], ['3452d.dsasa.txt', '1234 /data/c13af4/f.txt']]

If its a file, then you can do this:

with open('myfile.txt','r') as f:
   rows = [row for row in csv.reader(f,delimiter=',') if len(row) == 2]

Comments

1
and(re.search("bla_bla_pattern", str_item, re.IGNORECASE) == None)

is working.

Comments

0

You can also do it without negative look ahead. You just need to add parentheses to that part of expression which you want to extract. This construction with parentheses is named group.

Let's write python code:

string = """OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
"""

search_result = re.search(r"^OK.*\n((.|\s)*).", string)

if search_result:
    print(search_result.group(1))

Output is:

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt

^OK.*\n will find first line with OK statement, but we don't want to extract it so leave it without parentheses. Next is part which we want to capture: ((.|\s)*), so put it inside parentheses. And in the end of regexp we look for a dot ., but we also don't want to capture it.

P.S: I find this answer is super helpful to understand power of groups. https://stackoverflow.com/a/3513858/4333811

Comments

0

If the OK line is the first line and the last line is the dot you could consider slice them off like this:

TestString = '''OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
'''
print('\n'.join(TestString.split()[1:-1]))

However if this is a very large string you may run into memory problems.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.