2

How can I get various words from a string(URL) in python? From a URL like:

http://www.sample.com/level1/level2/index.html?id=1234

I want to get words like:

http, www, sample, com, level1, level2, index, html, id, 1234

Any solutions using python.

Thanks.

1
  • And i want to store the result in a list. Commented Jan 30, 2017 at 12:17

2 Answers 2

5

This is how you may do it for all URL

import re
def getWordsFromURL(url):
    return re.compile(r'[\:/?=\-&]+',re.UNICODE).split(url)

Now you may use it as

url = "http://www.sample.com/level1/level2/index.html?id=1234"
words = getWordsFromURL(url)
Sign up to request clarification or add additional context in comments.

2 Comments

I thought you wanted ['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234'], not ['http', 'www.sample.com', 'level1', 'level2', 'index.html?id', '1234']
@Jean-FrançoisFabre i have compiled as re.UNICODE now it works for stackoverflow.com/questions/41935748/splitting-a-stri‌​ng-url-into-words-us‌​ing-python
2

just regex-split according to the biggest sequence of non-alphanums:

import re
l = re.split(r"\W+","http://www.sample.com/level1/level2/index.html?id=1234")
print(l)

yields:

['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234']

This is simple but as someone noted, doesn't work when there are _, -, ... in URL names. So the less fun solution would be to list all possible tokens that can separate path parts:

l = re.split(r"[/:\.?=&]+","http://stackoverflow.com/questions/41935748/splitting-a-stri‌​ng-url-into-words-us‌​ing-python")

(I admit that I may have forgotten some separation symbols)

4 Comments

Doesn't work with URLs like http://stackoverflow.com/questions/41935748/splitting-a-string-url-into-words-using-python
@Himal check my answer,it covers that
no it doesn't: ['http', 'www.sample.com', 'level1', 'level2', 'index.html?id', '1234']
It Worked for me... Thanks :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.