4

I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:

1 2 3 4

2 3 4 5

and I want to put it into a 2d list called values = [[]]

i can get it to give me the rows ok and this code works ok:

values = map(int, line.split(','))

I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense

cheers

1
  • 1
    @user654174 There is no ',' in your exemple and you split by ',' . Incoherent Commented Mar 9, 2011 at 12:09

5 Answers 5

6
f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]

In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.

Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.

Sign up to request clarification or add additional context in comments.

17 Comments

If there are no empty lines, he can also drop the readlines().
hi, the blank line is there because for some reason it put the numbers on the same line so had to put the blank line in to separate them
Thus: a = [[int(token) for token in line.split()] for line in file] (file being a valid file handle i.e. obtained from open)
sorry for being dense, but what does token mean?
It's just a name - [int(token) for token in line.split()] takes each element of the list returned by line.split(), names it token and executes int() on it, thus forming another sequence, which contains all numbers in a line as integers. I could have chosen any other name.
|
1
import csv
import itertools

values = []

with open('text.file') as file_object:
    for line in csv.reader(file_object, delimiter=' '):
        values.append(map(int, line))

print "rows:", values
print "columns"
for column in itertools.izip(*values):
    print column

Output is:

rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)

1 Comment

"I just don't know how I can say the same thing but for the rows"
1

Get the data into your program by some method. Here's one:

f = open(tetxfile, 'r')
buffer = f.read()
f.close()

Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):

table = [map(int, row.split()) for row in buffer.strip().split("\n")]

>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]

Maybe it's ordered pairs you want instead, then transpose the table:

transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]

Comments

0

You could try to use the CSV-module. You can specify custom delimiters, so it might work.

Comments

0

If columns are separated by blanks

import re

A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')

with open('try.txt') as f:
    for line in f:
        a,b,c,d = pat.match(line.strip()).groups()
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

or with csv module

EDIT

A,B,C,D = [],[],[],[]    
with open('try.txt') as f:
    for line in f:
        a,b,c,d = line.split()
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

But if there are more than one blank between elements of data, this code will fail

EDIT 2

Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:

import re

A,B,C,D = [],[],[],[]
pat = re.compile('\s+')

with open('try.txt') as f:
    for line in f:
        a,b,c,d = pat.split(line.strip())
        A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

16 Comments

That is way too complicated for the purpose. Using regular expressions for everything makes code extremely hard to read.
Also, not using raw strings makes regexes fail. Usually :-)
@Alexander Gessler I don't write regexes for everything. "extremely hard" : you exagerate. But you are completely right: here, there is no need of regex. So I edit my answer
@Alexander Gessler "not using raw strings makes regexes fail" Not for me, I always write RE without raw string, I master the writing of RE without rawing them. In fact, I can't succeed to understand how works a raw string as a RE .....
re.compile('\s+') works only because \s is not a recognized escape sequence. Therefore, it is official recommendation to always use raw strings when specifying regexes.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.