1

I import my txt file as str by using with open

with open('./doc', 'r') as f:
dat = f.readlines()

then I want to clean the data by using a for loop

docs = []
for i in dat:
if i.strip()[0] != '<':
    docs.append(i)

error returns

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-131-92a67082e677> in <module>()
      1 docs = []
      2 for i in dat:
----> 3     if i.strip()[0] != '<':
      4         docs.append(i)

IndexError: string index out of range

but if I change the code like this,just select the first 3000 lines, the code works.

docs = []
for i in dat[:3000]:
if i.strip()[0] != '<':
    docs.append(i)

My txt file contains 93408 lines ,why I can't select them all? thx!

3
  • Indention if i.strip()[0] != '<': Commented Oct 30, 2017 at 1:18
  • because one of your lines is empty Commented Oct 30, 2017 at 1:18
  • 1
    Empty lines possibly? Try if i.strip() and i.strip()[0] != '<': Commented Oct 30, 2017 at 1:18

1 Answer 1

2

one or more lines could be empty, you need to check it before take first elem

if i.strip() != "" and i.strip()[0] != '<':
    docs.append(i)
Sign up to request clarification or add additional context in comments.

5 Comments

Small correction: if i.strip() != '' because i is a string. Furthermore, you can shorten that to if i.strip() because of truthiness testing. See my comment in the question too.
@cᴏʟᴅsᴘᴇᴇᴅ my old pal! Long time no see, always so kind, always so nice! Always right.. Thanks so much!
@cᴏʟᴅsᴘᴇᴇᴅ the code works! thank you for your help, as a beginner your kindness shining my way
@HerryWang don't forget to accept the answer with the check box! ;)
@HerryWang Well asked questions will be rewarded with help. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.