0

Okay, I'm stumped on this. I've looked at the Pandas documentation but I can't figure out the right way to do it and I think I'm just making a mess. Basically, I have data which are NumPy arrays.

For example:

data = numpy.loadtxt('foo.txt', dtype=str,delimiter=',') 
gps_data = numpy.concatenate((data[0:len(data),0:2],data[0:len(data),3:5]),axis=1)
gps_time = data[0:len(data),2:3].astype(numpy.float)/1000

The gps_data basically looks like this:

array([['50.3482627', '-71.662499', '30', 'network'],
       ['50.3482588', '-71.6624934', '30', 'network'],
       ['50.34829', '-71.6625077', '30', 'network'],
       ...,
       ['20.3482488', '-78.66245463999999', '9', 'gps'],
       ['20.3482598', '-78.6625174', '30', 'network'],
       ['20.34824943', '-78.6624565', '10', 'gps']],
      dtype='|S18')

and the gps_time:

array([[  1.16242035e+09],
       [  1.26242036e+09],
       [  1.36242038e+09],
       ...,
       [  1.32330411e+09],
       [  1.16330413e+09],
       [  1.26330413e+09]])

What I'm trying to do is use DataFrame to bring another similar looking array called acc_data, combine it with gps_data and then go back through and fill in the different missing data times.

This is what I've been trying:

df1 = DataFrame(gps_data,index=gps_time,columns=['GPS'])

But it gives the following error:

ValueError: Shape of passed values is (4, 35047), indices imply (1, 35047)

Which I don't know how to handle. If I can find a way around that, then I assume the next step df2 but for acc_data will work fine and then I can do:

p = Panel({'ACC': df1, 'GPS': df2})

Any help would be greatly appreciated as i have been stumped on this for last few hours.

2 Answers 2

3

You need to make sure you pass in as many column names (using the columns keyword) as there are columns in your NumPy array:

df1 = DataFrame(gps_data, index=gps_time, columns=['col1', 'col2', 'col3', 'col4'])

Pandas raises the error because you've given it an array with four columns and it only has one column name, 'GPS', which you've specified.

Sign up to request clarification or add additional context in comments.

5 Comments

Sweet thanks, although now when I do p = Panel({'GPS':df1,'ACC':df2}) it complains buffer has wrong number of dimensions expected 1 found 2. ?
No problem. What is your df2? What shape is it?
df2 is [7111 rows x 3 columns] (sorry I don't know how to do formatting properly in comments) But basically df2 looks like: x y z 1.362420e+09 -0.249893 4.125504 9.105667 1.362420e+09 -2.738571 5.260941 8.285629
@eWizardII Hmmm... I can't seem to replicate the error and I'm afraid I haven't played around with Panel a great deal. It might be a bug if you're using an older version of Pandas. If not, perhaps asking a new question is the way to go...
Alright will do thanks! I have version 0.14.1 on Windows which should be the latest version or close to it I believe.
2

ajcr is right; the error can be avoided by specifying the right number of columns. Since gps_data has shape (35047, 4), the DataFrame has four columns. So you need columns=['col1', 'col2', 'col3', 'col4'] if you are going to specify column names.

To get gps_data in the right shape, it would also be easier to use

import numpy as np
import pandas as pd
data = np.genfromtxt('foo.txt', dtype=None, delimiter=',',
                     usecols=[0,1,2,3,4])
gps_data = data[:, [0,1,3,4]]
gps_time = data[:, 2]/1000.0

and then you can build the DataFrame with

df1 = pd.DataFrame(gps_data, index=gps_time)

Caveats:

gps_time = data[0:len(data),2:3]

makes gps_time 2-dimensional with shape (35047, 1). If you use

gps_time = data[0:len(data),2]

then gps_time will be 1-dimensional, with shape (35047,). This is more likely what you want, since the index (time) appears to be 1-dimensional.


data = numpy.loadtxt('foo.txt', dtype=str,delimiter=',')

makes all your numbers strings. If you use

np.genfromtxt('foo.txt', dtype=None, )

the dtype=None tells genfromtxt to make an intelligent guess about the type of each column -- so your float-like numbers will automatically have dtype float.

1 Comment

Alright I'll try this also - it might be the cause of the problem I just followed up too the other answer below that I get an error when using Panel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.