Negation of a regex in Python

Question

I apologize, there are some quite similar questions. I went through them, but couldn't cope, though. It would be nice, if someone could help me on this.

I am willing to find any character (and blanks) except:

8-digit long substrings (eg 20110101)
substrings such as 0.68G or 10.76B(1 or 2 digits, dot, 2 digits, 1 letter)

from the text:

b'STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN
PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8
1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0
15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9
010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24
3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9

I came out with the regex: (\d{8}|\d{1,2}\.\d{1,2}[ABCDEFG]) which finds all (1) and (2).

It now need to 'negate' this. I tried out several possibilities such as (?! ... ), but that doesn't seem to work.

My expected output is: 20110101 0.00G 20110102 0.00G 20110103 0.68G 20110104 89.6* 20110105 0.57G 20110106 0.16G20110107

Do you have suggestions, please?

These looks like space delimited values (with a header row). What are you really trying to do? I guess you're trying extract some columns (and leave others). — Peter Sutton
– Peter Sutton, Commented Feb 16, 2015 at 10:09

Avinash Raj · Accepted Answer · 2015-02-16 10:09:33Z

2

You don't actually need to negate the pattern. Use the sme regex in re.findall function and join the resultant list items with a space character.

>>> s = '''STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN
PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8
1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0
15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9
010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24
3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9'''
>>> ' '.join(re.findall(r'(\b\d{8}\b|\b\d{1,2}\.\d{1,2}[ABCDEFG])', s))
'20110101 0.00G 20110102 0.00G 20110103 0.68G 20110104 0.14G 20110105 0.57G 20110106 0.16G 20110107'

answered Feb 16, 2015 at 10:09

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tags Over a year ago

yes! it is what i just found out. sorry for disturbing. and thx for your reply

vks · Accepted Answer · 2015-02-16 10:06:24Z

(?<!\d)\d{8}(?!\d)|\d{1,2}\.\d{2}[a-zA-Z]

Just use this with re.findall..See demo.

https://www.regex101.com/r/rK5lU1/27

import re
p = re.compile(r'(?<!\d)\d{8}(?!\d)|\d{1,2}\.\d{2}[a-zA-Z]', re.MULTILINE | re.IGNORECASE)
test_str = "b'STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8 1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0 15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9 010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24 3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9\n"

re.findall(p, test_str)

Collectives™ on Stack Overflow

Negation of a regex in Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related