python - Match Everything except the string regex

Question

Data Set

Cider
631

Spruce
871

Honda
18813

Nissan
3292

Pine
10621

Walnut
10301

Code

#!/usr/bin/python
import re

text = "Cider\n631\n\nSpruce\n871Honda\n18813\n\nNissan\n3292\n\nPine\n10621\n\nWalnut\n10301\n\n"

f1 = re.findall(r"(Cider|Pine)\n(.*)",text)

print(f1)

Current Result

[('Cider', '631'), ('Pine', '10621')]

Question:

How do I change the regex from matching everything except several specified strings? ex (Honda|Nissan)

Desired Result

[('Cider', '631'), ('Spruce', '871'), ('Pine', '10621'), ('Walnut', '10301')]

Exclude them: ^(?!Honda|Nissan)[a-zA-Z]+\n\d+ Demo

dawg
– dawg

2021-10-14 15:32:04 +00:00
Commented Oct 14, 2021 at 15:32 — dawg
– dawg, Commented Oct 14, 2021 at 15:32

The fourth bird · Accepted Answer · 2021-10-14 15:41:18Z

1

You can exclude matching either of the names or only digits, and then match the 2 lines starting with at least a non whitespace char.

^(?!(?:Honda|Nissan|\d+)$)(\S.*)\n(.*)

The pattern matches:

^ Start of string
(?! Negative lookahead, assert not directly to the right
- (?:Honda|Nissan|\d+)$ Match any of the alternatives at followed by asserting the end of the string
) Close lookahead
(\S.*) Capture group 1, match a non whitespace char followed by the rest of the line
\n Match a newline
(.*) Capture group 2, match any character except a newline

Regex demo

import re

text = ("Cider\n"
            "631\n\n"
            "Spruce\n"
            "871\n\n"
            "Honda\n"
            "18813\n\n"
            "Nissan\n"
            "3292\n\n"
            "Pine\n"
            "10621\n\n"
            "Walnut\n"
            "10301")
f1 = re.findall(r"^(?!(?:Honda|Nissan|\d+)$)(\S.*)\n(.*)", text, re.MULTILINE)

print(f1)

Output

[('Cider', '631'), ('Spruce', '871'), ('Pine', '10621'), ('Walnut', '10301')]

If the line should start with an uppercase char A-Z and the next line should consist of only digits:

^(?!Honda|Nissan)([A-Z].*)\n(\d+)$

This pattern matches:

^ Start of string
(?!Honda|Nissan) Negative lookahead, assert not Honda or Nissan directly to the right
([A-Z].*) Capture group 1, match an uppercase char A-Z followed by the rest of the line
\n Match a newline
(\d+) Capture group 2, match 1+ digits
$ End of string

Regex demo

edited Oct 14, 2021 at 15:41

answered Oct 14, 2021 at 15:23

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

The fourth bird Over a year ago

@Lacer You have to add re.MULTILINE

Lacer Over a year ago

that worked! thanks.

The fourth bird Over a year ago

@Lacer You are welcome. If all strings start with an uppercase char A-Z and the second line should have only digits ^(?!Honda|Nissan)([A-Z].*)\n(\d+)$ regex101.com/r/CNkdLD/1

The fourth bird Over a year ago

@Lacer I have added a breakdown of the patterns in the answer.

Lacer Over a year ago

thank you for the help and the explanation. Greatly appreciate it.

|

John Greene · Accepted Answer · 2021-10-14 15:28:38Z

1

inverse it with caret ‘^’ symbol.

f1 = re.findall(r"(\s?^(Cider|Pine))\n(.*)",text)

Keep in mind that caret symbol (in regex) has a special meaning if it is used as a first character match which then would alternatively mean to be “does it start at the beginning of a line”.

Thats why one would insert a “non-usable character” in the beginning. I chosed an optional single space to use up that first character thereby rendering the meaning of the caret (^) symbol as NOT to mean “the beginning of the line”, but to get the desired inverse operator.

answered Oct 14, 2021 at 15:28

John Greene

2,7183 gold badges31 silver badges45 bronze badges

Collectives™ on Stack Overflow

python - Match Everything except the string regex

2 Answers 2

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related