3

The goal of my codes are to write a function and return a list of strings, in which the successive strings (fruit name) correspond to the consecutive #No.1...#No.5 . The whole name of the fruit was split over multiple lines, and I want to display the fruit name in the list as a single string with no whitespace. I expect my codes return:

['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']

but I got:

['', 'Pear', 'Apple', 'Cherry', 'Banana', 'Peach']

These are the contents of my file fruit.txt:

#NO.1
P
ear
#NO.2
A
pp
l
e
#NO.3
Cherry
#NO.4
Banan
a
#NO.5
Pea
c
h

These are my codes:

def read(filename): 

    myfile = open('fruit', 'r')
    seq = ''
    list1 = []
    for line in myfile:

        if line[0] != '#':
            seq +=line.rstrip('\n')
        else:

            list1.append(seq)
            seq = ''

    list1.append(seq)    
    return list1

how to avoid to append an empty string which is not what I want? I suppose I just need to adjust the position a certain line of codes, any suggestion is appreciated.

2
  • Please note that your function will result in a memory leak if called repeatedly. You should never open a file without making sure it is subsequently closed. The easiest way to do this is by using the with construct. For further reading, see this link: effbot.org/zone/python-with-statement.htm Commented Jan 2, 2017 at 16:28
  • @sobek Got it, thank you!! Commented Jan 2, 2017 at 18:03

3 Answers 3

4

You could change the

    else:

to

    elif seq:

This checks whether seq is empty and only appends it if it's not.

Sign up to request clarification or add additional context in comments.

Comments

1

Alternative if you'd like a single line solution:

with open('fruit.txt') as f:
    content = f.read()

output = [''.join(x.split('\n')[1:len(x.split('\n'))+1]) for x in content.split('#') if len(x.split('\n')) > 1]

1 Comment

Good solution, Thank you!
1

Quick fix for removing empty strings from a list:

list1 = filter(None, list1)

How about this solution with regex? The following is a two-step process. First all whitespace like newlines, spaces etc. is removed. Then all words following your pattern #No.\d are found:

import re

whitespace = re.compile(r'\s*')
fruitdef = re.compile(r'#NO\.\d(\w*)')
inputfile = open('fruit', 'r').read()

inputstring = re.sub(whitespace, '', inputfile)
fruits = re.findall(fruitdef, inputstring)

print fruits

['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']


Minified to a oneliner:

import re

print re.findall(r'#NO\.\d(\w*)', re.sub(r'\s*', '', open('fruit', 'r').read()))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.