How to get rid of empty strings in Python when splitting a list?

Question

I have an input file which consists of these lines:

['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n', # and so on....]

I have formatted it with readlines, to this:

['Some Name', '', '', '', '2.0 2.0 1.3\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
# and so on

What I wanted to do, is to get the names beneath each other, while I am getting rid of the _ signs.

This is my code:

def openFile():
    fileFolder = open('TEXTFILE', 'r')
    readMyFile = fileFolder.readlines()

    for line in readFile:
        line = line.split("_")

        personNames = line[0]

        print personNames

print openFile()

So what I get now, is:

Some Name
Another Name
Another Name

That is cool, but I want to go further and that is where I am getting stuck. What I want to do now, is to get rid of the empty strings ("") and print the numbers you can see, just beside the names I've already formatted.

I thought that I could just do this:

for line in readFile:
    line = line.split("_")
    get_rid_of_spaces = line.split() #getting rid of spaces too

    personNames = line[0]

But this gives me this error:

AttributeError: 'list' object has no attribute 'split'

How can I do this? I want to learn this.

I also tried incrementing the index number, but this failed and I read it's not the best way to do this, so now I am going this way.

Beside that, I'd expect that when I'd do line[1], that it would give me the empty strings, but it doesn't.

What am I missing here?

Juan Diego Godoy Robles · Accepted Answer · 2016-11-20 14:26:18Z

4

Just use re split to get advantage of a multiple char delimiter:

>>> import re
>>> 
>>> line = 'Some Name__________2.0 2.0 1.3\n'
>>> re.split(r'_+', line)
['Some Name', '2.0 2.0 1.3\n']

Example in a for loop:

>>> lines = ['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> for dat in [re.split(r'_+|\n', line) for line in lines]:
...    person = dat[0]
...    id = dat[1]
...    print person, id
... 
Some Name 2.0 2.0 1.3
Some Name 1.0 9.0 1.

edited Nov 20, 2016 at 14:26

answered Nov 16, 2016 at 19:58

Juan Diego Godoy Robles

15k2 gold badges43 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Siyah Over a year ago

Interesting one. Not the one I am looking for, because I want to use a for loop instead, but interesting. Thanks.

Juan Diego Godoy Robles Over a year ago

It's easy to adapt to a for loop, check my edited answer

Benjamin James Over a year ago

I think OP is really looking for a dictionary, which this answer can easily be adapted to.

Batman · Accepted Answer · 2017-03-22 23:39:18Z

2

Use a list comprehension to remove the empty strings.

for line in read_file:
     tokens = [x for x in line.split("_") if x != ""]
     person_name = tokens[0]

edited Mar 22, 2017 at 23:39

answered Nov 16, 2016 at 20:06

Batman

9,0177 gold badges47 silver badges87 bronze badges

Comments

Francisco · Accepted Answer · 2016-11-16 19:58:15Z

1

You could do something like this:

for line in readFile:
    line = line.split("_")
    line = filter(bool, line)

This will remove all the empty string in the line list.

answered Nov 16, 2016 at 19:58

Francisco

11.6k6 gold badges37 silver badges47 bronze badges

1 Comment

Siyah Over a year ago

Hmm, it seems like it isn't doing what I need. Thanks anyway!

Ahasanul Haque · Accepted Answer · 2016-11-16 20:01:05Z

1

>>> a =['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> import re
>>> [re.search(r'_+(.+)$', i.rstrip()).group(1) for i in a]
['2.0 2.0 1.3', '1.0 9.0 1.0']

answered Nov 16, 2016 at 20:01

Ahasanul Haque

11.2k6 gold badges45 silver badges61 bronze badges

1 Comment

Siyah Over a year ago

Thanks. I won't use this, but it's good to know there is an alternative.

Alvaro · Accepted Answer · 2016-11-17 01:57:10Z

1

The output of str.split is a list

list doesn't have a split method, that's why you get that error.

You can instead do:

with open('yourfile') as f:
    for line in f:
         split = line.split('_')
         name, number = split[0], split[-1]
         print '{}-{}'.format(number, name)

Several things to note:

1) Don't use camel case

2) Use context managers for files, aka the with statement, it handles file status nicely if something fails

3) Pay attention to this line: for line in f:. It has the benefit of iterating through each line, never having the whole file in memory

edited Nov 17, 2016 at 1:57

answered Nov 16, 2016 at 19:58

Alvaro

12.1k9 gold badges47 silver badges60 bronze badges

6 Comments

Alvaro Over a year ago

@Siyah It creates a string of 10 _ characters. I tried to be literal since using regex might be out of the scope for this question

Siyah Over a year ago

But why are you creating a string of 10_ characters? What would that mean if I would have a file where I'd have 11_ characters? It wouldn't be a generic fix, am I right?

Alvaro Over a year ago

Oh sorry, I'll update with a generic one, I assumed it was 10 fixed chars

Alvaro Over a year ago

@Siyah I know you have accepted an answer but please pay attention to these points in order to improve your coding

Alvaro Over a year ago

@Siyah it's just the python standard for variable naming. You should check PEP8 if you are curious about it, it has lots of recommendations (just google it)

|

abacles · Accepted Answer · 2016-11-16 20:07:22Z

0

readfile=['Some name____2.0 2.1 1.3','Some other name_____2.2 3.4 1.1']

data=[]
for line in readfile:
    first_split=list(part for part in line.split('_') if part!='')
    data.append(list([first_split [0],first_split [1].split(' ')]))

print(data)

I think this does what you wanted if I understood you correctly. It prints out:

[['Some name', ['2.0', '2.1', '1.3']], ['Some other name', ['2.2', '3.4', '1.1']]]

answered Nov 16, 2016 at 20:07

abacles

8591 gold badge8 silver badges15 bronze badges

4 Comments

Siyah Over a year ago

Yes, this was what I needed... Thanks man. Last question: is there any other way I could do this... I mean, what I want is to get the numbers... how do I solely get the numbers?

abacles Over a year ago

Then you can leave out the first_split [0] part in the appending to data and only use first_split [1].split(' ')

abacles Over a year ago

That line would be something like this: data.append(list([first_split [1].split(' ')]))?

Alvaro Over a year ago

Why instantiate list from the generator if you can just do a list comprehesion?

Collectives™ on Stack Overflow

How to get rid of empty strings in Python when splitting a list?

6 Answers 6

3 Comments

Comments

1 Comment

1 Comment

6 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

1 Comment

1 Comment

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related