3

I have a text file that looks like this:

garbage
moregarbaged89849843
MDeduri09ri44830
Some short sentence
Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)
Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)
Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)
77EB0A2C7C42EDC27A3D26E72A02BB29:01002d737832455680cffbadf1092baf status 'garbage'
blah blah:0101bfa0ab9641a0b863ef76519a48d3 has status 'usable'
77EB0A2C7C42EDC27A3D26E72A02BB29:blah blah

I only care about the key and KID parts, and want to extract them to separate lists

My regex for that is key: (\w|\d){30,} and KID=(\w|\d){30,} respectively.

Code I'm using is

matchkid = re.compile('KID=(\w|\d){30,}')
matchkey = re.compile('key: (\w|\d){30,}')

filteredkids = [a for a in lis if matchkid.search(a)]
filteredkeys = [b for b in lis if matchkey.search(b)]

print(filteredkids)
print('\n')
print(filteredkeys)

Where lis is a list made from the lines of the text document

The output is

['Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)', 'Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)', 'Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)']


['Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)', 'Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)', 'Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)']

Which is wrong, the desired output is

['KID=01002d737832455680cffbadf1092baf', 'KID=0101bfa0ab9641a0b863ef76519a48d3', 'KID=0102900a2bc54111833631ea7bb855ed']

['key: d11001bfa937eee2f84f55a11b207356', 'key: a0ee2d0f8272355f750c5434db85291a', 'key: fe216ba17e5af807ce5af8e43cf3c031']

I have tried tweaking my regex and looking at other similar questions, but nothing seems to work and most of the time I just get empty lists.

Hoping to find some guidance here, thanks in advance

2
  • 3
    Do not use (\w|\d), use \w. \w matches digits. And then just use re.findall, re.findall(r''KID=\w{30,}", text) and re.findall(r''key: \w{30,}", text) Commented Apr 6, 2021 at 14:09
  • @WiktorStribiżew that works, thanks a lot! Commented Apr 6, 2021 at 14:19

1 Answer 1

1

The (\w|\d){30,} is not a good pattern as it creates a repeated capturing group, and is redundant itself: \w matches digits, too, so \w{30,} is enough.

Next, you are using re.search that only returns a Match data object, and you use listeneing comprehension to iterate over that object, while you need to grab all matches from your strings.

You can fix the code by using

filteredkids = re.findall(r'KID=\w{30,}', text)
filteredkeys = re.findall(r'key: \w{30,}', text)

See the Python demo:

import re
text = """garbage
moregarbaged89849843
MDeduri09ri44830
Some short sentence
Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)
Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)
Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)
77EB0A2C7C42EDC27A3D26E72A02BB29:01002d737832455680cffbadf1092baf status 'garbage'
blah blah:0101bfa0ab9641a0b863ef76519a48d3 has status 'usable'
77EB0A2C7C42EDC27A3D26E72A02BB29:blah blah"""
filteredkids = re.findall(r'KID=\w{30,}', text)
filteredkeys = re.findall(r'key: \w{30,}', text)
print( filteredkids )
print( filteredkeys )

Output:

['KID=01002d737832455680cffbadf1092baf', 'KID=0101bfa0ab9641a0b863ef76519a48d3', 'KID=0102900a2bc54111833631ea7bb855ed']
['key: d11001bfa937eee2f84f55a11b207356', 'key: a0ee2d0f8272355f750c5434db85291a', 'key: fe216ba17e5af807ce5af8e43cf3c031']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.