1

I tried to open a .txt file as an array in python, so I can operate on the elements within. The .txt file (abc.txt) looks something like this.

AL192012,               TONY,     20,
20121021, 1800,  , LO, 20.1N,  50.8W,  25, 1011,
20121022, 0000,  , LO, 20.4N,  51.2W,  25, 1011,
20121022, 0600,  , LO, 20.8N,  51.5W,  25, 1010,
20121022, 1200,  , LO, 21.3N,  51.7W,  30, 1009,
AL182012,              SANDY,     45,
20121021, 1800,  , LO, 14.3N,  77.4W,  25, 1006,
20121022, 0000,  , LO, 13.9N,  77.8W,  25, 1005,
20121022, 0600,  , LO, 13.5N,  78.2W,  25, 1003,
20121022, 1200,  , TD, 13.1N,  78.6W,  30, 1002,

I have tried pd.read_csv('abc.txt'), loadtxt("abc.txt") and genfromtxt("abc.txt"). But they only generated array with three columns, probably because the first row only had three columns. But I want it to have the same eight columns as the .txt file. Is this possible? Thanks!

2
  • Well, what do you expect those two lines that don’t have as many columns to appear at in the result? Commented Jan 22, 2014 at 10:56
  • Thanks. If this array is named b, I want to get SANDY by b[5,4] and get TD by b[9,3]. Commented Jan 22, 2014 at 11:16

3 Answers 3

2

try something like this:

data = []
with open("filename") as f:
  for line in f:
    data.append(line.split(","))

and that'll give you a 2D array of the data you can operate on.

if you want to transpose it, you can't just use regular zip, you need to use itertools.izip_longest, as mentioned here.

so you then transpose it like:

data = list(itertools.izip_longest(*data))
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. But I may need a bit more help here if possible. I only got a list called data. Is there a way I can get the 10-by-8 2Darray I want, where for example the element at [0,0] gives me AL192012, [0,4] gives TONY, and [9,3] gives TD?
@user3223064 it is a 2D array, you access the elements like array[0][4] in python. If you want to access it like that, then you'll want to use numpy, and if you're going to do that, you might as well go the full distance and just use numpy.loadtxt()
Thanks. Yours works. But on the other hand numpy.loadtxt() still only gives me three columns instead of eight. anyway..
1
>>> with open(filename) as f:
        data = [[cell.strip() for cell in row.rstrip(',').split(',')] for row in f]

>>> for row in data:
        print(row)

['AL192012', 'TONY', '20']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', 'SANDY', '45']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']

If you want to fix the indexes for the short lines, you could explicitely do that afterwards:

>>> data = [row if len(row) == 8 else row[0:1] + [''] * 3 + row[1:3] + [''] * 2 for row in data]
>>> for row in data:
        print(row)

['AL192012', '', '', '', 'TONY', '20', '', '']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', '', '', '', 'SANDY', '45', '', '']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']

2 Comments

Thanks. But may I ask if this gives an array? Seems like data[0] gives first row and data[1] gives second row and so on, with type(data) being a list each. Is there an array where element [5,4] gives SANDY? Or am I not getting your idea..
data will be a list of lists; so doing data[5][4] will give SANDY etc. There are no arrays in Python directly, and the [5,4] syntax suggests that you are trying to use arrays from NumPy or something. I think you can convert lists to arrays somehow, but i don’t know how that works—but you don’t necessarily need to do that anyway. Using lists is just fine.
0

Here a snippet:

#!/usr/bin/python

import sys

with open(sys.argv[1], 'r') as f:
    content = f.readlines()

for w in content:
    print w

    # split and loop again -> w.split(',')

f.readlines() returns an array
w is an array.

1 Comment

Thanks. But what should I do with your last line? Because when I only included your five lines from import till print w, type(content) is a list, and w is only content[66] which is a string. May I ask what do you mean by split and loop again...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.