I'm new to python and would like to know how I can tokenize strings based on a specified delimiter. For example, if I have the string "brother's" and I would like to turn it to ["brother", "\s"] or a string "red/blue" to ["red", "blue"], what would be the most appropriate way to do this? Thanks.
3 Answers
What you're looking for is called split, and it's called on the str object. For instance:
>>> brotherstring = "brother's"
>>> brotherstring.split("'")
['brother', 's']
>>> redbluestring = "red/blue"
>>> redbluestring.split("/")
['red', 'blue']
There are a few variants on split, such as rsplit, partition, etc that all do different things. Read the documentation to find the one that works best for your purpose.
Comments
Try this.
>>> strr = "brother's"
>>> strr.replace("'","\\'").split("\\")
['brother', "'s"]
>>> strrr = "red/blue"
>>> strrr.split('/')
['red', 'blue']
2 Comments
VISQL
This is a great answer. It shows how to preserve punctuation, in the case that your punctuation is not your delimiter. Can reconstruct the original later, or clean further if the apostrophe is really unwanted.
Tanveer Alam
@VISQL Thanks for the appreciation.
pydoc strand work from there.