2

I have an input file which consists of these lines:

['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n', # and so on....]

I have formatted it with readlines, to this:

['Some Name', '', '', '', '2.0 2.0 1.3\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
# and so on

What I wanted to do, is to get the names beneath each other, while I am getting rid of the _ signs.

This is my code:

def openFile():
    fileFolder = open('TEXTFILE', 'r')
    readMyFile = fileFolder.readlines()

    for line in readFile:
        line = line.split("_")

        personNames = line[0]

        print personNames

print openFile()

So what I get now, is:

Some Name
Another Name
Another Name

That is cool, but I want to go further and that is where I am getting stuck. What I want to do now, is to get rid of the empty strings ("") and print the numbers you can see, just beside the names I've already formatted.

I thought that I could just do this:

for line in readFile:
    line = line.split("_")
    get_rid_of_spaces = line.split() #getting rid of spaces too

    personNames = line[0]

But this gives me this error:

AttributeError: 'list' object has no attribute 'split'

How can I do this? I want to learn this.

I also tried incrementing the index number, but this failed and I read it's not the best way to do this, so now I am going this way.

Beside that, I'd expect that when I'd do line[1], that it would give me the empty strings, but it doesn't.

What am I missing here?

6 Answers 6

4

Just use re split to get advantage of a multiple char delimiter:

>>> import re
>>> 
>>> line = 'Some Name__________2.0 2.0 1.3\n'
>>> re.split(r'_+', line)
['Some Name', '2.0 2.0 1.3\n']

Example in a for loop:

>>> lines = ['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> for dat in [re.split(r'_+|\n', line) for line in lines]:
...    person = dat[0]
...    id = dat[1]
...    print person, id
... 
Some Name 2.0 2.0 1.3
Some Name 1.0 9.0 1.
Sign up to request clarification or add additional context in comments.

3 Comments

Interesting one. Not the one I am looking for, because I want to use a for loop instead, but interesting. Thanks.
It's easy to adapt to a for loop, check my edited answer
I think OP is really looking for a dictionary, which this answer can easily be adapted to.
2

Use a list comprehension to remove the empty strings.

for line in read_file:
     tokens = [x for x in line.split("_") if x != ""]
     person_name = tokens[0]

Comments

1

You could do something like this:

for line in readFile:
    line = line.split("_")
    line = filter(bool, line)

This will remove all the empty string in the line list.

1 Comment

Hmm, it seems like it isn't doing what I need. Thanks anyway!
1
>>> a =['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> import re
>>> [re.search(r'_+(.+)$', i.rstrip()).group(1) for i in a]
['2.0 2.0 1.3', '1.0 9.0 1.0']

1 Comment

Thanks. I won't use this, but it's good to know there is an alternative.
1

The output of str.split is a list

list doesn't have a split method, that's why you get that error.

You can instead do:

with open('yourfile') as f:
    for line in f:
         split = line.split('_')
         name, number = split[0], split[-1]
         print '{}-{}'.format(number, name)

Several things to note:

1) Don't use camel case

2) Use context managers for files, aka the with statement, it handles file status nicely if something fails

3) Pay attention to this line: for line in f:. It has the benefit of iterating through each line, never having the whole file in memory

6 Comments

@Siyah It creates a string of 10 _ characters. I tried to be literal since using regex might be out of the scope for this question
But why are you creating a string of 10_ characters? What would that mean if I would have a file where I'd have 11_ characters? It wouldn't be a generic fix, am I right?
Oh sorry, I'll update with a generic one, I assumed it was 10 fixed chars
@Siyah I know you have accepted an answer but please pay attention to these points in order to improve your coding
@Siyah it's just the python standard for variable naming. You should check PEP8 if you are curious about it, it has lots of recommendations (just google it)
|
0
readfile=['Some name____2.0 2.1 1.3','Some other name_____2.2 3.4 1.1']

data=[]
for line in readfile:
    first_split=list(part for part in line.split('_') if part!='')
    data.append(list([first_split [0],first_split [1].split(' ')]))

print(data)

I think this does what you wanted if I understood you correctly. It prints out:

[['Some name', ['2.0', '2.1', '1.3']], ['Some other name', ['2.2', '3.4', '1.1']]]

4 Comments

Yes, this was what I needed... Thanks man. Last question: is there any other way I could do this... I mean, what I want is to get the numbers... how do I solely get the numbers?
Then you can leave out the first_split [0] part in the appending to data and only use first_split [1].split(' ')
That line would be something like this: data.append(list([first_split [1].split(' ')]))?
Why instantiate list from the generator if you can just do a list comprehesion?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.