0

I have this json response (returned via python) with nested HTML in label:

{
"point": {
"lat": "27.938829",
"long": "-82.322109"
},
"label": "<div class=\"lifestyle-results_item lifestyle-results_item-b\"><div class=\"locationinfo_area\"><h4 style=\"width:230px;\">At Petco</h4><h3 class=\"dealer_cat\"></h3><address style=\"left:240px\">2434 West Brandon Boulevard<br />Brandon, FL 33511<br /><br />813-571-0120<div class=\"contentinfo_area_operated\"></div></address></div><div class=\"contentinfo_area\"><div class=\"contentinfo_area_zip\">4.2 mi from Zip Code 33584</div></div><a href=\"https://vetcoclinics.petco.com?store_number=PET2722&source=vetcoclinics\" target=\"_blank\"><div class=\"contentinfo_area\"><div class=\"contentinfo_area_reserve\">BOOK NOW</div></div></a><div class=\"timeinfo_area\"><b>Sun, May 9</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 16</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 23</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, May 30</b><br/> at 10:00 AM - 1:00 PM<br /><b>Sun, June 6</b><br/> at 10:00 AM - 1:00 PM<br /></div></div>",
"title": "At Petco ",
"html": "<div class=\"googlemap_bubble\"><b>At Petco<br><span></span></b><br />2434 West Brandon Boulevard<br />Brandon, FL 33511<br />813-571-0120<br /></div>"
}

How can I use regex to extract the blow from label:

6
  • 1
    you can just make a new element, make that its innerHTML, then read from there ;-; Commented May 5, 2021 at 21:34
  • What environment/language do you use to parse this JSON? Commented May 5, 2021 at 21:39
  • @geauser python. Commented May 5, 2021 at 21:43
  • @TheBombSquad Interesting, okay I will attempt. Commented May 5, 2021 at 21:44
  • Take a look at docs.python.org/3/library/html.parser.html @a1234 I won't be able to help you much more as it's been awhile since I've used Python, but it will certainly simply your code compared with the Regex solution. Commented May 5, 2021 at 21:50

1 Answer 1

1

I suggest you to avoid using regex and sticking with HTML parsing using Beautiful Soup

Assuming you have your JSON data in a variable called data you can do the following:

from bs4 import BeautifulSoup

htmlData = data["label"]
soup = BeautifulSoup(htmlData, 'html.parser')

address = soup.address.string
link = soup.a.get('href')

Then you can use a simple split to get the additional data in the address variable:

addressParts = address.split("<br />") 

And use an url parser to get the store_number parameter from your link variable:

from urllib import parse

storeName = parse.parse_qs(parse.urlparse(link).query)['store_name'][0]

You will end up with

  • The addressParts list containing the elements ["2434 West Brandon Boulevard", "Brandon, FL 33511", "", "813-571-0120"]
  • The link variable containing https://vetcoclinics.petco.com?store_number=PET2722&source=vetcoclinics\
  • The storeName variable containing PET2722
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.