I have a text:
text = 'dear customer your account xx9052 has been debited with inr25697.50 on 23-nov-18 info
bil001582495861 icici bank the available balance is inr 363.25'
Here, I am trying to extract information like account number, amount, date and available balance from the text.
I tried this by the following regex:
pattern = 'your account (.*) has been debited with (.*) on (.*) info (.*) available balance is (.*\d)$'
if (re.search(pattern, text, re.IGNORECASE)):
print(re.search(pattern, text, re.IGNORECASE).group(1)), \
print(re.search(pattern, text, re.IGNORECASE).group(2)), \
print(re.search(pattern, text, re.IGNORECASE).group(3)), \
print(re.search(pattern, text, re.IGNORECASE).group(5))
I got the desired results:
xx9333
inr 25697.50
23-nov-18
inr 363.25
but I am facing the issue with this regex pattern, when the text is slightly modified:
text = 'dear customer your account xx9052 has been debited with inr 25697.50 on 23-nov-18 info bil 001582495861 icici bank the available balance is inr 363.25 for dispute call 04033667777'
Using the same regex gives me result:
xx9333
inr 25697.50
23-nov-18
inr 363.25 for dispute call 04033667777
balance is extracted with extra information while it should be only inr 363.25.. How can I resolve this issue so information is correctly extracted in both cases using a single pattern?