Splitting Text File Into Columns and Rows in Python

Question

I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:

1 2 3 4

2 3 4 5

and I want to put it into a 2d list called values = [[]]

i can get it to give me the rows ok and this code works ok:

values = map(int, line.split(','))

I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense

cheers

@user654174 There is no ',' in your exemple and you split by ',' . Incoherent — eyquem
– eyquem, Commented Mar 9, 2011 at 12:09

Alexander Gessler · Accepted Answer · 2011-03-09 13:19:16Z

6

f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]

In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.

Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.

edited Mar 9, 2011 at 13:19

answered Mar 9, 2011 at 12:08

Alexander Gessler

46.9k7 gold badges86 silver badges124 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

Björn Pollex Over a year ago

If there are no empty lines, he can also drop the readlines().

user651474 Over a year ago

hi, the blank line is there because for some reason it put the numbers on the same line so had to put the blank line in to separate them

Alexander Gessler Over a year ago

Thus: a = [[int(token) for token in line.split()] for line in file] (file being a valid file handle i.e. obtained from open)

user651474 Over a year ago

sorry for being dense, but what does token mean?

Alexander Gessler Over a year ago

It's just a name - [int(token) for token in line.split()] takes each element of the list returned by line.split(), names it token and executes int() on it, thus forming another sequence, which contains all numbers in a line as integers. I could have chosen any other name.

|

Mahmoud Abdelkader · Accepted Answer · 2011-03-09 13:34:54Z

1

import csv
import itertools

values = []

with open('text.file') as file_object:
    for line in csv.reader(file_object, delimiter=' '):
        values.append(map(int, line))

print "rows:", values
print "columns"
for column in itertools.izip(*values):
    print column

Output is:

rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)

edited Mar 9, 2011 at 13:34

answered Mar 9, 2011 at 12:52

Mahmoud Abdelkader

25.3k5 gold badges44 silver badges57 bronze badges

1 Comment

eyquem Over a year ago

"I just don't know how I can say the same thing but for the rows"

BarryPye · Accepted Answer · 2011-11-14 22:07:09Z

1

Get the data into your program by some method. Here's one:

f = open(tetxfile, 'r')
buffer = f.read()
f.close()

Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):

table = [map(int, row.split()) for row in buffer.strip().split("\n")]

>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]

Maybe it's ordered pairs you want instead, then transpose the table:

transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]

answered Nov 14, 2011 at 22:07

BarryPye

2,0892 gold badges19 silver badges21 bronze badges

Comments

Björn Pollex · Accepted Answer · 2011-03-09 12:08:16Z

0

You could try to use the CSV-module. You can specify custom delimiters, so it might work.

answered Mar 9, 2011 at 12:08

Björn Pollex

77.1k30 gold badges206 silver badges290 bronze badges

Comments

eyquem · Accepted Answer · 2011-03-09 14:07:48Z

0

If columns are separated by blanks

import re

A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')

with open('try.txt') as f:
    for line in f:
        a,b,c,d = pat.match(line.strip()).groups()
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

or with csv module

EDIT

A,B,C,D = [],[],[],[]    
with open('try.txt') as f:
    for line in f:
        a,b,c,d = line.split()
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

But if there are more than one blank between elements of data, this code will fail

EDIT 2

Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:

import re

A,B,C,D = [],[],[],[]
pat = re.compile('\s+')

with open('try.txt') as f:
    for line in f:
        a,b,c,d = pat.split(line.strip())
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

edited Mar 9, 2011 at 14:07

answered Mar 9, 2011 at 12:21

eyquem

27.7k7 gold badges43 silver badges46 bronze badges

16 Comments

Alexander Gessler Over a year ago

That is way too complicated for the purpose. Using regular expressions for everything makes code extremely hard to read.

Alexander Gessler Over a year ago

Also, not using raw strings makes regexes fail. Usually :-)

eyquem Over a year ago

@Alexander Gessler I don't write regexes for everything. "extremely hard" : you exagerate. But you are completely right: here, there is no need of regex. So I edit my answer

eyquem Over a year ago

@Alexander Gessler "not using raw strings makes regexes fail" Not for me, I always write RE without raw string, I master the writing of RE without rawing them. In fact, I can't succeed to understand how works a raw string as a RE .....

Alexander Gessler Over a year ago

re.compile('\s+') works only because \s is not a recognized escape sequence. Therefore, it is official recommendation to always use raw strings when specifying regexes.

|

Collectives™ on Stack Overflow

Splitting Text File Into Columns and Rows in Python

5 Answers 5

17 Comments

1 Comment

Comments

Comments

16 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

17 Comments

1 Comment

Comments

Comments

16 Comments

Your Answer

Sign up or log in

Post as a guest

Related