1

I have a text file called test.txt which contains data in this format

....
a|b|c|d|e
a1|b2|c3|d4|e5
a3|b5|c2|d1|e3
....

I want to get the values of each column into lists: something like this

list1=[a,a1,a3]
list2=[b,b2,b5]

I managed to get this done by doing this:

list1,list2,list3,list4,list5 = ([] for i in range(5))

for line in open('test.txt','r'):
    temp=line.split('|')
    list1.append(temp[0])
    list2.append(temp[1])
    list3.append(temp[2])
    list4.append(temp[3])
    list5.append(temp[4].strip())

Is there shorter way to append the values to each list? I can only think of using 1 line for each list as above.

2 Answers 2

2

zip() is your friend here:

list1, list2, list3, list4, list5 = zip(
    *(line.strip().split('|') for line in open('test.txt')))

As an added bonus, you could also use this even if you didn't know how many columns there were - just assign it to a single variable instead, and you'd get a list, each item of which is the values for a column:

column_values = zip(*(line.strip().split('|') for line in open('test.txt')))
# column_values[0] is [a,a1,a3] ...

Let's step through this a little bit. First, we'll take a look at what happens with just the zip() bit:

list1, list2, list3, list4, list5 = zip(
    [0,1,2,3,4], [0,1,2,3,4], [0,1,2,3,4])

results in list1 = [0,0,0] and so on, because zip() takes the first element from each list and puts it in a list as the first element of the result.


Now, how do we get to zip(a,b,c) from a sequence [a,b,c]? Simple: we use the * positional argument expansion operator. zip(*L) is the same as zip(L[0], L[1], ...).


Finally, how do we get the list of lists we need to pass in? We use a generator expression:

(line.strip().split('|') for line in open('test.txt'))

creates a generator that yields a list of the items in each line, one line at a time (and strips whitespace off the items). This is exactly what we need to feed to zip() to get the result we want.

Sign up to request clarification or add additional context in comments.

2 Comments

very informative! Not trying to be pesky, i have a few more questions.(1) I need the ouput as lists and not tuples.Is there a short way to convert them back without affecting the order? (only way i know is list(tuple)) (2) Does this keep the '\n' character at the last column?Do i need to use strip.() to remove it?
@ChrisAung I already included strip() in the generator (stripping before splitting means you don't have to special case any of the columns). As far as converting them back - list() is the way to go; if you're using the column_values version you could just add this line: column_values = [list(i) for i in column_values].
0

You can use a list of lists:

table = [[] for i in range(5)]

with open('test.txt', 'r') as handle:
    for line in handle:
        for index, value in enumerate(line.strip().split('|')):
            table[index].append(value)

So instead of having list1, list2, etc., you just access the cells by table[0][0], table[2][1], etc.

1 Comment

Looks like he wants the transpose of that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.