0

I apologize, there are some quite similar questions. I went through them, but couldn't cope, though. It would be nice, if someone could help me on this.

I am willing to find any character (and blanks) except:

  1. 8-digit long substrings (eg 20110101)

  2. substrings such as 0.68G or 10.76B(1 or 2 digits, dot, 2 digits, 1 letter)

from the text:

b'STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN
PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8
1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0
15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9
010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24
3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9

I came out with the regex: (\d{8}|\d{1,2}\.\d{1,2}[ABCDEFG]) which finds all (1) and (2).

It now need to 'negate' this. I tried out several possibilities such as (?! ... ), but that doesn't seem to work.

My expected output is: 20110101 0.00G 20110102 0.00G 20110103 0.68G 20110104 89.6* 20110105 0.57G 20110106 0.16G20110107

Do you have suggestions, please?

3
  • What's your expected output? Commented Feb 16, 2015 at 10:03
  • is \n present in your text is an actual newline character? Commented Feb 16, 2015 at 10:06
  • These looks like space delimited values (with a header row). What are you really trying to do? I guess you're trying extract some columns (and leave others). Commented Feb 16, 2015 at 10:09

2 Answers 2

2

You don't actually need to negate the pattern. Use the sme regex in re.findall function and join the resultant list items with a space character.

>>> s = '''STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN
PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8
1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0
15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9
010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24
3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9'''
>>> ' '.join(re.findall(r'(\b\d{8}\b|\b\d{1,2}\.\d{1,2}[ABCDEFG])', s))
'20110101 0.00G 20110102 0.00G 20110103 0.68G 20110104 0.14G 20110105 0.57G 20110106 0.16G 20110107'
Sign up to request clarification or add additional context in comments.

1 Comment

yes! it is what i just found out. sorry for disturbing. and thx for your reply
1
(?<!\d)\d{8}(?!\d)|\d{1,2}\.\d{2}[a-zA-Z]

Just use this with re.findall..See demo.

https://www.regex101.com/r/rK5lU1/27

import re
p = re.compile(r'(?<!\d)\d{8}(?!\d)|\d{1,2}\.\d{2}[a-zA-Z]', re.MULTILINE | re.IGNORECASE)
test_str = "b'STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT\n486200 99999 20110101 79.3 24 74.5 24 1007.2 8 1006.2 8 6.6 24 2.2 24 7.0 999.9 87.8 74.1 0.00G 999.9 010000\n486200 99999 20110102 79.7 24 74.9 24 1007.8 8 1006.9 8 6.1 24 2.8 24 8.0 15.0 91.9 74.8 0.00G 999.9 010010\n486200 99999 20110103 77.5 24 73.6 24 1008.5 8 1007.6 8 6.0 24 2.8 24 6.0 999.9 83.7 73.4* 0.68G 999.9 010000\n486200 99999 20110104 81.2 24 75.0 24 1007.7 8 1006.8 8 6.3 24 3.0 24 5.1 999.9 89.6* 73.0 0.14G 999.9 010010\n486200 99999 20110105 79.7 24 74.8 24 1007.8 8 1006.8 8 7.0 24 2.4 24 6.0 999.9 87.8 73.0 0.57G 999.9 010000\n486200 99999 20110106 77.4 24 74.6 24 1008.8 8 1007.9 8 6.0 24 1.5 24 4.1 999.9 81.0 73.2 0.16G 999.9 010000\n486200 99999 20110107 77.7 24 75.0 24 1008.9\n"

re.findall(p, test_str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.