Extract all URLs from the json object in Python

Question

I have a long json object which contains URL links in the value, these links can be at any depth and with any key. The depth and key is not known. Ex.,

data = {
  "name": "John Doe",
  "a": "https:/example.com",
  "b": {
    "c": "https://example.com/path",
    "d": {
      "e": "https://example.com/abc/?q=u",
    }
  }
}

I want to extract all links in a list like

links = ["https://example.com", "https://example.com/path", "https://example.com/abc/?q=u"]

How can I extract all the links from the object using Python?

how do you identify urls? Is it okay to assume they all start with "HTTP"? — Roy2012
– Roy2012, Commented Jun 19, 2020 at 5:23
Yes, they all wll start from http or https. Any string without these protocols will not be treated as valid URL — Anuj TBE
– Anuj TBE, Commented Jun 19, 2020 at 5:26

Erik Cederstrand · Accepted Answer · 2020-06-19 05:30:16Z

2

Here's a recursive solution:

def extract_urls(d):
    urls = []
    for k, v in d.items():
        if isinstance(v, str) and v.lower().startswith("http"):
            urls.append(v)
        elif isinstance(v, dict):
            urls.extend(etract_urls(v))
    return urls

extract_urls(data)

Output:

['https:/example.com',
 'https://example.com/path',
 'https://example.com/abc/?q=u']

edited Jun 19, 2020 at 5:30

Erik Cederstrand

10.4k8 gold badges44 silver badges71 bronze badges

answered Jun 19, 2020 at 5:27

Roy2012

12.7k3 gold badges28 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Durandal Over a year ago

There's a typo in this answer; someone else edited the function to be called extract_urls (it was originally called etract_urls) to fix a typo, but didn't update the recursive function. The recursive function is still (incorrectly) calling etract_urls.

Collectives™ on Stack Overflow

Extract all URLs from the json object in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related