1

I have a text file in the following format:

a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,

How can i read it into a list efficiently so as to get the following output?

list=[[1,4,1,6],[1,5,2,9],[2,6,5,8],[3,7,7,5]]

3 Answers 3

3

Let's assume that the file is named spam.txt:

$ cat spam.txt
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,    

Using list comprehensions and the zip() built-in function, you can write a program such as:

>>> with open('spam.txt', 'r') as file:
...     file.readline() # skip the first line
...     rows = [[int(x) for x in line.split(',')[:-1]] for line in file]
...     cols = [list(col) for col in zip(*rows)]
... 
'a,b,c,d,\n'
>>> rows
[[1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]]
>>> cols
[[1, 4, 1, 6], [1, 5, 2, 9], [2, 6, 5, 8], [3, 7, 7, 5]]

Additionally, zip(*rows) is based on unpacking argument lists, which unpacks a list or tuple so that its elements can be passed as separate positional arguments to a function. In other words, zip(*rows) is reduced to zip([1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]).

EDIT:

This is a version based on NumPy for reference:

>>> import numpy as np
>>> with open('spam.txt', 'r') as file:
...     ncols = len(file.readline().split(',')) - 1
...     data = np.fromiter((int(v) for line in file for v in line.split(',')[:-1]), int, count=-1)
...     cols = data.reshape(data.size / ncols, ncols).transpose()
...
>>> cols
array([[1, 4, 1, 6],
       [1, 5, 2, 9],
       [2, 6, 5, 8],
       [3, 7, 7, 5]])
Sign up to request clarification or add additional context in comments.

3 Comments

yes it's clear nice explanation... since i am dealing with large text files,size of the list "rows" or "cols" will be large and the RAM consumed for the above code is around 1.4 GB for 500 MB input file.is there any optimized way to do this..?
@JagannathKs It depends on your goal. What are you going to do with the columns finally?
i will get 2 such columns for 2 different files and process them based on certain criteria....any way ill try to optimize it.thanks for your reply
0

You can try the following code:

from numpy import*

x0 = []
for line in file('yourfile.txt'):
    line = line.split()
    x = line[1]
   x0.append(x)

for i in range(len(x0)):
print x0[i]

Here the first column is appended onto x0[]. You can append the other columns in a similar fashion.

3 Comments

Why is numpy required here?
numpy contains a powerful N-dimensional array object and can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows numpy to seamlessly and speedily integrate with a wide variety of databases.
Where is it used in your example?
0

You can use data_py package to read column wise data from a file. Install this package by using

pip install data-py==0.0.1

Example

from data_py import datafile
df1=datafile("C:/Folder/SubFolder/data-file-name.txt")
df1.separator=","
[Col1,Col2,Col3,Col4,Col5]=["","","","",""]
[Col1,Col2,Col3,Col4,Col5]=df1.read([Col1,Col2,Col3,Col4,Col5],lineNumber)
print(Col1,Col2,Col3,Col4,Col5)

For details please follow the link https://www.respt.in/p/python-package-datapy.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.