0

I want to do the following: 1) Load 14 files into Numpy arrays 2) concatenate the 14 Numpy arrays 3) Extract the row indexes of where each Numpy array starts and ends according to the files in the concatenated array to create a new numpy array that assigns a class number from 1 to 14 to each row of data according to the file it belongs to.

I have created the following code to solve this:

import numpy as np
from numpy import genfromtxt
from numpy import *


name1 = 'backandfforwwalk4smallstepsML'
name2 = 'backandfforwwalk4stepsML'
name3 = 'backandforsteps1lineML'
name4 = 'leftdnrightfarML'
name5 = 'sidestyletwoML'
name6 = 'walkingsideML'
name7 = 'fastwalkML'

class1 = genfromtxt('allSumRowSignals_'+ name1 + '_even.csv', delimiter=',')
class2 = genfromtxt('allSumRowSignals_'+ name1 + '_odd.csv', delimiter=',')
class3 = genfromtxt('allSumRowSignals_'+ name2 + '_even.csv', delimiter=',')
class4 = genfromtxt('allSumRowSignals_'+ name2 + '_odd.csv', delimiter=',')
class5 = genfromtxt('allSumRowSignals_'+ name3 + '_even.csv', delimiter=',')
class6 = genfromtxt('allSumRowSignals_'+ name3 + '_odd.csv', delimiter=',')
class7 = genfromtxt('allSumRowSignals_'+ name4 + '_even.csv', delimiter=',')
class8 = genfromtxt('allSumRowSignals_'+ name4 + '_odd.csv', delimiter=',')
class9 = genfromtxt('allSumRowSignals_'+ name5 + '_even.csv', delimiter=',')
class10 = genfromtxt('allSumRowSignals_'+ name5 + '_odd.csv', delimiter=',')
class11 = genfromtxt('allSumRowSignals_'+ name6 + '_even.csv', delimiter=',')
class12 = genfromtxt('allSumRowSignals_'+ name6 + '_odd.csv', delimiter=',')
class13 = genfromtxt('allSumRowSignals_'+ name7 + '_even.csv', delimiter=',')
class14 = genfromtxt('allSumRowSignals_'+ name7 + '_odd.csv', delimiter=',')


#Load files that have similar name


a = np.concatenate((class1,class2),axis=0)
b = np.concatenate((a,class3),axis=0)
c = np.concatenate((b,class4),axis=0)
d = np.concatenate((c,class5),axis=0)
e = np.concatenate((d,class6),axis=0)
f = np.concatenate((e,class7),axis=0)
g = np.concatenate((f,class8),axis=0)
h = np.concatenate((g,class9),axis=0)
i = np.concatenate((h,class10),axis=0)
j = np.concatenate((i,class11),axis=0)
k = np.concatenate((j,class12),axis=0)
l = np.concatenate((k,class13),axis=0)
m = np.concatenate((l,class14),axis=0)

#concatenate all of them, m is the concatenated file

#calculating the indexes for each class

class1ends = len(class1[:,1])
class2ends = len(a[:,1])
class3ends = len(b[:,1])
class4ends = len(c[:,1])
class5ends = len(d[:,1])
class6ends = len(e[:,1])
class7ends = len(f[:,1])
class8ends = len(g[:,1])
class9ends = len(h[:,1])
class10ends = len(i[:,1])
class11ends = len(j[:,1])
class12ends = len(k[:,1])
class13ends = len(l[:,1])
class14ends = len(m[:,1])

#is required to know in which row each of the files ends to assign a value number from 1 to 14 in a separate files, according to the number of files


Y = np.zeros((len(m)))

Y[0:class1ends+1]= 1
Y[class1ends:class2ends+1]= 2
Y[class2ends:class3ends+1]= 3
Y[class3ends:class4ends+1]= 4
Y[class4ends:class5ends+1]= 5
Y[class5ends:class6ends+1]= 6
Y[class6ends:class7ends+1]= 7
Y[class7ends:class8ends+1]= 8
Y[class8ends:class9ends+1]= 9
Y[class9ends:class10ends+1]= 10
Y[class10ends:class11ends+1]= 11
Y[class11ends:class12ends+1]= 12
Y[class12ends:class13ends+1]= 13
Y[class13ends:class14ends+1]= 14

#according to the previously saved indexes, creade a new variable with same length as m and assign a class number for each file


print class14ends

np.savetxt('y.csv', Y, delimiter=',', fmt="%s")
np.savetxt('X.csv', m, delimiter=',', fmt="%s")

#save classes as Y
#save data as X

I am looking for a faster, compacted and general way to do this (many files). Any recommendations?

2
  • Have you profiled this - is concatenation the slowest part of your code for sure? Commented Apr 11, 2015 at 13:41
  • on the issue of "compacted," use a for loop, dude. Commented Apr 11, 2015 at 16:31

1 Answer 1

1

You can streamline this action by recognizing that concatenate takes a list of multiple arrays.

Here's a simpler example:

In [76]: class0=np.zeros((3,4))    
In [77]: class1=np.ones((2,4))
In [78]: class2=np.ones((5,4))*2
In [79]: class3=np.ones((2,4))*3
In [80]: class_list=[class0,class1,class2,class3]

In [81]: lenlist=[x.shape[0] for x in class_list]
In [82]: m = np.concatenate(class_list, axis=0)

In [84]: m
Out[84]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       ...
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.]])

In [85]: lenlist
Out[85]: [3, 2, 5, 2]
In [87]: class_ends=np.cumsum(lenlist)
In [88]: class_ends
Out[88]: array([ 3,  5, 10, 12], dtype=int32)

In [91]: Y=np.repeat(range(len(lenlist)),lenlist)
In [92]: Y
Out[92]: array([0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.