Splitting a string into multiple strings in Python

Question

Hey everyone I was trying to figure out a way to change a string like this (in python3)

"<word>word</word>"

into three strings

"<word>" "word" "</word>"

that I'm going to put in a list.

At first I tried the strip() command, but it only strips the beginning and the end of the string. Then I tried a more complicated method of reading through the text one letter at a time, building the word, and adding a " " after any ">" using an IF statement but I couldn't figure out how to add a space before the other "<".

Is their a simple way to split these words up?

Edit: This isn't all my data, I am reading in an xml file and using a stack class to make sure that the file is balanced.

<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>...

Edit2: Thanks for all the answers everyone! I would vote up all your answers if I could. For practical use the xml parser did work fine but for what I needed the regex command worked perfectly. Thank You!

The split() function is closer to what you need, but still not exactly. If you're trying to parse html/xml, you should use a parsing library. It's a less than trivial task. — Vyassa Baratham
– Vyassa Baratham, Commented Jul 29, 2013 at 21:59
"I am reading in an xml file" - then you should probably use an xml parser. Python has a few different ones available in the xml module. — l4mpi
– l4mpi, Commented Jul 29, 2013 at 22:14
Using an XML parser will automatically throw you some errors if you don't have balanced or well-formed XML... You don't want to be going down the trying to split string route - especially if you have attributes on elements that'll make it trickier to process etc... — Jon Clements
– Jon Clements, Commented Jul 29, 2013 at 22:15

v2b · Accepted Answer · 2013-07-29 22:37:43Z

2

You should use xml parser for this. Following is an example of parsing,

>>> import xml.etree.ElementTree as ET
>>> xml = '<root><word1>my_word_1</word1><word2>my_word_2</word2><word3>my_word_3</word3></root>';
>>> tree = ET.fromstring(xml);
>>> for child in tree:
...     print child.tag, child.text
...
word1 my_word_1
word2 my_word_2
word3 my_word_3
>>>

once you read the values, pushing them in a stack is easy.

answered Jul 29, 2013 at 22:37

v2b

1,4569 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

score 1 · Accepted Answer · 2013-07-29 23:40:30Z

Regex with the replace method of a string works:

>>> import re
>>> s = "<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>"
>>> re.findall("\S+", s.replace(">", "> ").replace("<", " <"))
['<word1>', '</word1>', '<word2>', 'worda', '</word2>', '<word3>', 'wordb', '</word3>', '<word4>', '</word4>']
>>>

Or, an alternate solution that doesn't use Regex:

>>> s = "<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>"
>>> s.replace(">", "> ").replace("<", " <").split()
['<word1>', '</word1>', '<word2>', 'worda', '</word2>', '<word3>', 'wordb', '</word3>', '<word4>', '</word4>']
>>>

The Regex solution though allows for more control over the matching (you can add more to the expression to really curtomize it).

Note however that these will only work if the data is like the examples given.

woodlumhoodlum · Accepted Answer · 2013-07-29 22:14:41Z

1

I believe you are looking for the split method.

input.split(">")

you may need to add the angle brackets back in after splitting. it kind of depends if you will always be in that pattern.

It might be better to use a library if your input follows a variable pattern.

http://docs.python.org/2/library/htmlparser.html

edited Jul 29, 2013 at 22:14

answered Jul 29, 2013 at 22:06

woodlumhoodlum

4623 gold badges10 silver badges24 bronze badges

3 Comments

Pawel Miech Over a year ago

I'm not sure if this will work in this case, with input given by OP, this will produce the output: ['<word1', '</word1', ' <word2', 'worda</word2', ' <word3', 'wordb</word3', ' <word4', '</word4', ''].

woodlumhoodlum Over a year ago

right that's why I mentioned he'd have to add the angle brackets back in after splitting. He would need a statement that says if a word begins with a left angle bracket then add a right angle bracket to the end. and yes he would have to resplit the seconds portions that don't begin with one - creating a nightmarish parsing algorithm.

Pawel Miech Over a year ago

Ah! Ok, I know where you're coming from.

Collectives™ on Stack Overflow

Splitting a string into multiple strings in Python

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related