How can I load a csv file and store its contents into an (numpy) array in python?

Question

Given the following two csv files that contain strings only, how can I load them into an (numpy) array?

**1.txt**
A,B,D
E,G,A

**2.txt**
A,B,D
E,G,A

**data**
1,A,B,D
1,E,G,A
2,A,B,D
2,E,G,A

jabaldonedo · Accepted Answer · 2013-09-24 10:36:29Z

4

You can load them using numpy.loadtxt:

>>> import numpy as np
>>> data1 = np.loadtxt("1.txt", dtype=np.object, delimiter=",")
>>> data2 = np.loadtxt("2.txt", dtype=np.object, delimiter=",")
>>> print data1
 [['A' 'B' 'D']
  ['E' 'G' 'A']]

If you want to stack both arrays use numpy.vstack:

>>> np.vstack( (data1, data2) )
 [['A' 'B' 'D']
  ['E' 'G' 'A']
  ['A' 'B' 'D']
  ['E' 'G' 'A']]

And if you want to add the source:

>>> first_col = np.vstack( (np.array([[1] * data1.shape[0]]).T, np.array([[2] * data2.shape[0]]).T) )
>>> stack = np.vstack( (data1, data2) )
>>> data = np.hstack( (first_col, stack) )
>>> print data
 [[1 'A' 'B' 'D']
  [1 'E' 'G' 'A']
  [2 'A' 'B' 'D']
  [2 'E' 'G' 'A']]

If you want to save it with the save format:

>>> np.savetxt('data.txt', data, fmt='%s', delimiter=",")

This will generate data.txt:

1,A,B,D
1,E,G,A
2,A,B,D
2,E,G,A

Update: Function for handling unlimited number of files (I am assuming that files are named as numbers with .txt extension in the same way you specify in your question: 1.txt, 2.txt, 3.txt... n.txt):

import numpy as np

def get_from_csv(fname):
    data = np.loadtxt(fname, dtype=np.object, delimiter=",")
    col = np.array([[ int(fname.rstrip(".txt")) ] * data.shape[0]]).T
    return np.hstack( (col, data) )

files = ["1.txt", "2.txt", "3.txt"]

for f in files:
    try:
        data = np.vstack( (data, get_from_csv(f)) )
    except:
        data = get_from_csv(f)
print data

Which will output:

[[1 'A' 'B' 'D']
 [1 'E' 'G' 'A']
 [2 'A' 'B' 'D']
 [2 'E' 'G' 'A']
 [3 'A' 'B' 'D']
 [3 'E' 'G' 'A']]

edited Sep 24, 2013 at 10:36

answered Sep 24, 2013 at 9:48

jabaldonedo

26.7k8 gold badges80 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user2295350 Over a year ago

ok but what about the first column that indicates the source of the row (1 or 2 respectively)?

user2295350 Over a year ago

ohh i forgot one important thing; since i have a lot of files, this data1 = np.loadtxt("1.txt", dtype=np.object, delimiter=",") this is not practical at all!!! what could I do instead?

jabaldonedo Over a year ago

You should formulate your question clearly, it will make answers fit your requirements! Anyway, I have updated again, now you specify the filenames inside files list.

user2295350 Over a year ago

apologise for the inconvenience and many thanks for your help!

Lee · Accepted Answer · 2013-09-24 12:34:25Z

1

You could use genfromtxt

>>> a=np.genfromtxt('1.txt',dtype=None,delimiter=',')
>>> b=np.genfromtxt('2.txt',dtype=None,delimiter=',')
>>> data = np.vstack((a,b))
>>> data
array([['A', 'B', 'D'],
       ['E', 'G', 'A'],
       ['A', 'B', 'D'],
       ['E', 'G', 'A']], 
       dtype='|S1')

If you need to add the '1' and '2' you could do this:

>>> c= np.ones((2,1),dtype=int)
>>> d = c*2
>>> a = np.hstack((c,a))
>>> b = np.hstack((d,b))
>>> data = np.vstack((a,b))
>>> data
array([['1', 'A', 'B', 'D'],
       ['1', 'E', 'G', 'A'],
       ['2', 'A', 'B', 'D'],
       ['2', 'E', 'G', 'A']], 
      dtype='|S1')

edited Sep 24, 2013 at 12:34

answered Sep 24, 2013 at 11:21

Lee

31.4k31 gold badges124 silver badges187 bronze badges

Collectives™ on Stack Overflow

How can I load a csv file and store its contents into an (numpy) array in python?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related