1

Given the following two csv files that contain strings only, how can I load them into an (numpy) array?

**1.txt**
A,B,D
E,G,A

**2.txt**
A,B,D
E,G,A

**data**
1,A,B,D
1,E,G,A
2,A,B,D
2,E,G,A

2 Answers 2

4

You can load them using numpy.loadtxt:

>>> import numpy as np
>>> data1 = np.loadtxt("1.txt", dtype=np.object, delimiter=",")
>>> data2 = np.loadtxt("2.txt", dtype=np.object, delimiter=",")
>>> print data1
 [['A' 'B' 'D']
  ['E' 'G' 'A']]

If you want to stack both arrays use numpy.vstack:

>>> np.vstack( (data1, data2) )
 [['A' 'B' 'D']
  ['E' 'G' 'A']
  ['A' 'B' 'D']
  ['E' 'G' 'A']]

And if you want to add the source:

>>> first_col = np.vstack( (np.array([[1] * data1.shape[0]]).T, np.array([[2] * data2.shape[0]]).T) )
>>> stack = np.vstack( (data1, data2) )
>>> data = np.hstack( (first_col, stack) )
>>> print data
 [[1 'A' 'B' 'D']
  [1 'E' 'G' 'A']
  [2 'A' 'B' 'D']
  [2 'E' 'G' 'A']]

If you want to save it with the save format:

>>> np.savetxt('data.txt', data, fmt='%s', delimiter=",")

This will generate data.txt:

1,A,B,D
1,E,G,A
2,A,B,D
2,E,G,A

Update: Function for handling unlimited number of files (I am assuming that files are named as numbers with .txt extension in the same way you specify in your question: 1.txt, 2.txt, 3.txt... n.txt):

import numpy as np

def get_from_csv(fname):
    data = np.loadtxt(fname, dtype=np.object, delimiter=",")
    col = np.array([[ int(fname.rstrip(".txt")) ] * data.shape[0]]).T
    return np.hstack( (col, data) )

files = ["1.txt", "2.txt", "3.txt"]

for f in files:
    try:
        data = np.vstack( (data, get_from_csv(f)) )
    except:
        data = get_from_csv(f)
print data

Which will output:

[[1 'A' 'B' 'D']
 [1 'E' 'G' 'A']
 [2 'A' 'B' 'D']
 [2 'E' 'G' 'A']
 [3 'A' 'B' 'D']
 [3 'E' 'G' 'A']]
Sign up to request clarification or add additional context in comments.

4 Comments

ok but what about the first column that indicates the source of the row (1 or 2 respectively)?
ohh i forgot one important thing; since i have a lot of files, this data1 = np.loadtxt("1.txt", dtype=np.object, delimiter=",") this is not practical at all!!! what could I do instead?
You should formulate your question clearly, it will make answers fit your requirements! Anyway, I have updated again, now you specify the filenames inside files list.
apologise for the inconvenience and many thanks for your help!
1

You could use genfromtxt

>>> a=np.genfromtxt('1.txt',dtype=None,delimiter=',')
>>> b=np.genfromtxt('2.txt',dtype=None,delimiter=',')
>>> data = np.vstack((a,b))
>>> data
array([['A', 'B', 'D'],
       ['E', 'G', 'A'],
       ['A', 'B', 'D'],
       ['E', 'G', 'A']], 
       dtype='|S1')

If you need to add the '1' and '2' you could do this:

>>> c= np.ones((2,1),dtype=int)
>>> d = c*2
>>> a = np.hstack((c,a))
>>> b = np.hstack((d,b))
>>> data = np.vstack((a,b))
>>> data
array([['1', 'A', 'B', 'D'],
       ['1', 'E', 'G', 'A'],
       ['2', 'A', 'B', 'D'],
       ['2', 'E', 'G', 'A']], 
      dtype='|S1')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.