Splitting a String URL into words Using Python

Question

How can I get various words from a string(URL) in python? From a URL like:

http://www.sample.com/level1/level2/index.html?id=1234

I want to get words like:

http, www, sample, com, level1, level2, index, html, id, 1234

Any solutions using python.

Thanks.

And i want to store the result in a list.

Shakoor Ab
– Shakoor Ab

2017-01-30 12:17:54 +00:00
Commented Jan 30, 2017 at 12:17 — Shakoor Ab
– Shakoor Ab, Commented Jan 30, 2017 at 12:17

Sarath Sadasivan Pillai · Accepted Answer · 2017-01-30 12:49:33Z

5

This is how you may do it for all URL

import re
def getWordsFromURL(url):
    return re.compile(r'[\:/?=\-&]+',re.UNICODE).split(url)

Now you may use it as

url = "http://www.sample.com/level1/level2/index.html?id=1234"
words = getWordsFromURL(url)

edited Jan 30, 2017 at 12:49

answered Jan 30, 2017 at 12:21

Sarath Sadasivan Pillai

7,15135 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jean-François Fabre Over a year ago

I thought you wanted ['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234'], not ['http', 'www.sample.com', 'level1', 'level2', 'index.html?id', '1234']

Sarath Sadasivan Pillai Over a year ago

@Jean-FrançoisFabre i have compiled as re.UNICODE now it works for stackoverflow.com/questions/41935748/splitting-a-stri‌ng-url-into-words-us‌ing-python

Jean-François Fabre · Accepted Answer · 2017-01-30 12:28:31Z

2

just regex-split according to the biggest sequence of non-alphanums:

import re
l = re.split(r"\W+","http://www.sample.com/level1/level2/index.html?id=1234")
print(l)

yields:

['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234']

This is simple but as someone noted, doesn't work when there are _, -, ... in URL names. So the less fun solution would be to list all possible tokens that can separate path parts:

l = re.split(r"[/:\.?=&]+","http://stackoverflow.com/questions/41935748/splitting-a-stri‌ng-url-into-words-us‌ing-python")

(I admit that I may have forgotten some separation symbols)

edited Jan 30, 2017 at 12:28

answered Jan 30, 2017 at 12:19

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

4 Comments

Himal Over a year ago

Doesn't work with URLs like http://stackoverflow.com/questions/41935748/splitting-a-string-url-into-words-using-python

Sarath Sadasivan Pillai Over a year ago

@Himal check my answer,it covers that

Jean-François Fabre Over a year ago

no it doesn't: ['http', 'www.sample.com', 'level1', 'level2', 'index.html?id', '1234']

Shakoor Ab Over a year ago

It Worked for me... Thanks :-)

Collectives™ on Stack Overflow

Splitting a String URL into words Using Python

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related