I have a long string that I got from webscraping using python. I wanna be able to get an output in a form like {'XXXXXXXX':'AAAAAAAA','YYYYYYYY':'BBBBBBBB} and hopefully put everything in a dataframe.
This is a sample of the very long string:
\\n display:block\\u0022\\u003E\\n div class= span_6\\u0022\\u003E\\n li class=\\u0022borderbottom padleft pad20 nomargin\\u0022\\u003E\\n span\\u003E1. XXXXXXXX\\/span\\u003E\\n strong class=\\u0022floatright\\u0022\\u003EAAAAAAAA\\/strong\\u003E\\n \\/li\\u003E\\n li class=\\u0022borderbottom padleft pad20 nomargin\\u0022\\u003E\\n span\\u003E2. YYYYYYYY\\/span\\u003E\\n strong class=\\u0022floatright\\u0022\\u003EBBBBBBBB\\/strong\\u003E\\n
#Blockquoting for clarity:
\n display:block\u0022\u003E\n
div class= span_6\u0022\u003E\n
li class=\u0022borderbottom padleft pad20 nomargin\u0022\u003E\n
span\u003E1. XXXXXXXX\/span\u003E\n
strong class=\u0022floatright\u0022\u003EAAAAAAAA\/strong\u003E\n
\/li\u003E\n
li class=\u0022borderbottom padleft pad20 nomargin\u0022\u003E\n
span\u003E2. YYYYYYYY\/span\u003E\n
strong class=\u0022floatright\u0022\u003EBBBBBBBB\/strong\u003E\n
I'm trying to do this:
#s = the string
pattern = "u003E\|(.*?)\|\\/strong"
substring = re.search(pattern, s).group(1)
print(substring)
but its failing. What's the best way to do this?
Edit: Expected output is two lists:
list1 = ['XXXXXXXX','YYYYYYYY']
list2 = ['AAAAAAAA','BBBBBBBB']