0

I have a file with a bunch of information. For example, all of the lines follow the same pattern as this:

     <school>Nebraska</school>

I am trying to use the split function to only retrieve 'Nebraska'. This is what I have so far, but I'm not sure what to put to make it cut off both parts instead of just the first.

   with open('Pro.txt') as fo:
       for rec in fo:
          print(rec.split('>')[1])

With this I get:

    Nebraska</school
3
  • 3
    If it's XML or HTML, and it looks as though it is, you should use a proper parser such as BeautifulSoup (HTML) or LXML (XML). Python also comes with parsers that will do the trick, though the ones I suggested are better. Commented Dec 7, 2016 at 1:55
  • Use an XML parser, either built-in or a PyPI module like lxml or BeautifulSoup. Don't try to roll your own XML parsing code. Commented Dec 7, 2016 at 1:55
  • Do you have an example of the file? It's possible that you're dealing with a subset of *ML that may be easier to work with. Commented Dec 7, 2016 at 1:57

3 Answers 3

1

You've cut off part of the string. Keep going in the same fashion:

>>> s = '<school>Nebraska</school>'
>>> s.split('>')[1]
'Nebraska</school'
>>> s.split('>')[1].split('<')[0]
'Nebraska'

That said, you should parse HTML with an HTML parser like BeautifulSoup.

Sign up to request clarification or add additional context in comments.

Comments

0
s = '<school>Nebraska</school>'

in:

s.split('>')

out:

['<school', 'Nebraska</school', '']

in:

s.split('>')[1].split('<')

out:

['Nebraska', '/school']

in:

s.split('>')[1].split('<')[0]

out:

'Nebraska'

Comments

0

You could use a regular expression:

import re
regexp = re.compile('<school>(.*?)<\/school>')

with open('Pro.txt') as fo:
    for rec in fo:
        match = regexp.match(rec)
        if match: 
            text = match.groups()[0]
            print(text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.