1

Hey everyone I was trying to figure out a way to change a string like this (in python3)

"<word>word</word>" 

into three strings

"<word>" "word" "</word>" 

that I'm going to put in a list.

At first I tried the strip() command, but it only strips the beginning and the end of the string. Then I tried a more complicated method of reading through the text one letter at a time, building the word, and adding a " " after any ">" using an IF statement but I couldn't figure out how to add a space before the other "<".

Is their a simple way to split these words up?

Edit: This isn't all my data, I am reading in an xml file and using a stack class to make sure that the file is balanced.

<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>...

Edit2: Thanks for all the answers everyone! I would vote up all your answers if I could. For practical use the xml parser did work fine but for what I needed the regex command worked perfectly. Thank You!

4
  • 1
    The split() function is closer to what you need, but still not exactly. If you're trying to parse html/xml, you should use a parsing library. It's a less than trivial task. Commented Jul 29, 2013 at 21:59
  • Is that the extent of your input data? Commented Jul 29, 2013 at 22:05
  • 2
    "I am reading in an xml file" - then you should probably use an xml parser. Python has a few different ones available in the xml module. Commented Jul 29, 2013 at 22:14
  • Using an XML parser will automatically throw you some errors if you don't have balanced or well-formed XML... You don't want to be going down the trying to split string route - especially if you have attributes on elements that'll make it trickier to process etc... Commented Jul 29, 2013 at 22:15

3 Answers 3

2

You should use xml parser for this. Following is an example of parsing,

>>> import xml.etree.ElementTree as ET
>>> xml = '<root><word1>my_word_1</word1><word2>my_word_2</word2><word3>my_word_3</word3></root>';
>>> tree = ET.fromstring(xml);
>>> for child in tree:
...     print child.tag, child.text
...
word1 my_word_1
word2 my_word_2
word3 my_word_3
>>>

once you read the values, pushing them in a stack is easy.

Sign up to request clarification or add additional context in comments.

Comments

1

Regex with the replace method of a string works:

>>> import re
>>> s = "<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>"
>>> re.findall("\S+", s.replace(">", "> ").replace("<", " <"))
['<word1>', '</word1>', '<word2>', 'worda', '</word2>', '<word3>', 'wordb', '</word3>', '<word4>', '</word4>']
>>>

Or, an alternate solution that doesn't use Regex:

>>> s = "<word1></word1> <word2>worda</word2> <word3>wordb</word3> <word4></word4>"
>>> s.replace(">", "> ").replace("<", " <").split()
['<word1>', '</word1>', '<word2>', 'worda', '</word2>', '<word3>', 'wordb', '</word3>', '<word4>', '</word4>']
>>>

The Regex solution though allows for more control over the matching (you can add more to the expression to really curtomize it).

Note however that these will only work if the data is like the examples given.

Comments

1

I believe you are looking for the split method.

input.split(">")

you may need to add the angle brackets back in after splitting. it kind of depends if you will always be in that pattern.

It might be better to use a library if your input follows a variable pattern.

http://docs.python.org/2/library/htmlparser.html

3 Comments

I'm not sure if this will work in this case, with input given by OP, this will produce the output: ['<word1', '</word1', ' <word2', 'worda</word2', ' <word3', 'wordb</word3', ' <word4', '</word4', ''].
right that's why I mentioned he'd have to add the angle brackets back in after splitting. He would need a statement that says if a word begins with a left angle bracket then add a right angle bracket to the end. and yes he would have to resplit the seconds portions that don't begin with one - creating a nightmarish parsing algorithm.
Ah! Ok, I know where you're coming from.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.