1

Ok so I asked a question not long time ago but I forgot regex is very delicate and I showed the string in the wrong format.

The problem is, I receive a huge disorganized text that is all in one line.

In this line i have 2 different "blocks" I need: "Most frequent senders" and "Most frequent receivers"

As I said, it's all in one straight line, kinda like this:

 string = """ 
Huge text etc etc etc Most frequent senders: NAME OF THE PERSON - 01.234.567/0001-89 (SOME RANDOM UPPERCASE TEXT) - 14 time(s) in total of: R$10.000,00 NAME OF THE PERSON - 012.345.678-90 (SOME RANDOM UPPERCASE TEXT) - 30 times in total of: R$10.000,00 NAME OF THE PERSON - 01.234.567/0001-89 (SOME RANDOM UPPERCASE TEXT) - 10 times in total of: R$10.000,00 Most frequent recipients:     NAME OF THE PERSON - 01.234.567/0001-89 (SOME RANDOM UPPERCASE TEXT) - 14 time(s) in total of: R$10.000,00 NAME OF THE PERSON - 012.345.678-90 (SOME RANDOM UPPERCASE TEXT) - 30 time(s) in total of: R$10.000,00 NAME OF THE PERSON - 01.234.567/0001-89 (SOME RANDOM UPPERCASE TEXT) - 10 time(s) in total of: R$10.000,00 More text after this.  """

As you can see, this is terribly disorganized but it's how I receive it.

Basically what I'm trying to do is get the name of the person, the ID (that can have 2 patterns xx.xxx.xxx/0001-xx or xxx.xxx.xxx-xx), the number of times and the amount (in BRL so R$).

I found a way to get the IDS but that is it, nothing more.

    r = re.compile(r' [0-9]{3}\.?[0-9]{3}\.?[0-9]{3}\-?[0-9]{2} | [0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\/?[0-9]{4}\-?[0-9]{2} ')

print(r.findall(string))

Any help would be very much appreciated.

1 Answer 1

1

Supposing the name of the person is always uppercase and preceded by digits (or : for the first occurrence) and white space(s):

r = re.compile(r'(?<=[\d:])\s+([A-Z ]*) - ([0-9]{3}\.?[0-9]{3}\.?[0-9]{3}\-?[0-9]{2}|[0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\/?[0-9]{4}\-?[0-9]{2}).*?- (\d*)\s.*?: R\$([\d\.,]+)')

Note: You had unnecessary white spaces in you original regex after/before the IDs. You should get more matches with this one.

Also you'll get a more beautiful output with the following command:

print(*r.findall(string), sep='\n')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer, that helps getting the name and id, but i still need the number of times and the ammount of money
I added time and amount. Note that you have to escape $

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.