How to use split function for file in python?

Question

I have a file with a bunch of information. For example, all of the lines follow the same pattern as this:

     <school>Nebraska</school>

I am trying to use the split function to only retrieve 'Nebraska'. This is what I have so far, but I'm not sure what to put to make it cut off both parts instead of just the first.

   with open('Pro.txt') as fo:
       for rec in fo:
          print(rec.split('>')[1])

With this I get:

    Nebraska</school

If it's XML or HTML, and it looks as though it is, you should use a proper parser such as BeautifulSoup (HTML) or LXML (XML). Python also comes with parsers that will do the trick, though the ones I suggested are better. — kindall
– kindall, Commented Dec 7, 2016 at 1:55
Use an XML parser, either built-in or a PyPI module like lxml or BeautifulSoup. Don't try to roll your own XML parsing code. — ShadowRanger
– ShadowRanger, Commented Dec 7, 2016 at 1:55
Do you have an example of the file? It's possible that you're dealing with a subset of *ML that may be easier to work with. — Iluvatar
– Iluvatar, Commented Dec 7, 2016 at 1:57

TigerhawkT3 · Accepted Answer · 2016-12-07 01:57:48Z

1

You've cut off part of the string. Keep going in the same fashion:

>>> s = '<school>Nebraska</school>'
>>> s.split('>')[1]
'Nebraska</school'
>>> s.split('>')[1].split('<')[0]
'Nebraska'

That said, you should parse HTML with an HTML parser like BeautifulSoup.

answered Dec 7, 2016 at 1:57

TigerhawkT3

49.5k6 gold badges66 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

宏杰李 · Accepted Answer · 2016-12-07 01:59:50Z

0

s = '<school>Nebraska</school>'

in:

s.split('>')

out:

['<school', 'Nebraska</school', '']

in:

s.split('>')[1].split('<')

out:

['Nebraska', '/school']

in:

s.split('>')[1].split('<')[0]

out:

'Nebraska'

answered Dec 7, 2016 at 1:59

宏杰李

12.2k2 gold badges32 silver badges37 bronze badges

Comments

Maurice Meyer · Accepted Answer · 2016-12-07 02:05:48Z

0

You could use a regular expression:

import re
regexp = re.compile('<school>(.*?)<\/school>')

with open('Pro.txt') as fo:
    for rec in fo:
        match = regexp.match(rec)
        if match: 
            text = match.groups()[0]
            print(text)

answered Dec 7, 2016 at 2:05

Maurice Meyer

18.2k4 gold badges35 silver badges54 bronze badges

Collectives™ on Stack Overflow

How to use split function for file in python?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related