I have a dynamic text which looks something like this
my_text = "address ae fae daq ad, 1231 asdas landline 213121233 -123 mobile 513121233 cell (132) -142-3127
email [email protected] , sdasd [email protected] - [email protected]"
The text starts with an 'address'. As soon as we see 'address' we need to scrape everything from there until either 'landline'/'mobile'/'cell' appears. From there on, we want to scrape when all the phone text (without altering spaces in between). We start from the first occurrence of either 'landline'/'mobile'/'cell' and stop as soon as we find 'email' appear. Finally we scrape the email part (without altering spaces in between)
'landline'/'mobile'/'cell' can appear in any order and sometimes some may not appear. For example, the text could have looked like this as well.
my_text = "address ae fae daq ad, 1231 asdas
cell (132) -142-3127 landline 213121233 -123
email [email protected] , sdasd [email protected] - [email protected]"
There's a little more engineering that needs to be done to form arrays of subtext contained in address, phones and email text. Subtexts of addresses are always separated with commas (,). Subtexts of emails can be separated with commas (,) or hyphens (-).
My output should be a JSON dictionary which looks something like this:
resultant_dict = {
addresses: [
{ address: "ae fae daq ad" }
, { address: "1231 asdas" }
]
, phones: [
{ number: "213121233 -123", kind: "landline" }
, { number: "513121233", kind: "mobile" }
, { number: "(132 -142-3127", kind: "cell" }
]
, emails: [
{ email: "[email protected]", connector: "" }
, { email: "sdasd [email protected]", connector: "," }
, { email: "[email protected]", connector: "-" }
]
}
I am trying to achieve this thing using regular expressions or any other way in Python. I can't figure out how to write this as I am a novice programmer.