just regex-split according to the biggest sequence of non-alphanums:
import re
l = re.split(r"\W+","http://www.sample.com/level1/level2/index.html?id=1234")
print(l)
yields:
['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234']
This is simple but as someone noted, doesn't work when there are _, -, ... in URL names. So the less fun solution would be to list all possible tokens that can separate path parts:
l = re.split(r"[/:\.?=&]+","http://stackoverflow.com/questions/41935748/splitting-a-string-url-into-words-using-python")
(I admit that I may have forgotten some separation symbols)