1

I have a set of data associated with Npts points. Some of that data are scalar values, such as color, some of the data are multi-dimensional, such as 3d position. I am trying to bundle this data into a pandas data structure, and get a variety of error messages depending on how I try to do it.

Here's some mock data:

Npts=100
pos = np.random.uniform(0, 250, Npts*3).reshape(Npts, 3)
colors = np.random.uniform(-1, 1, Npts)

Using a dictionary as input, the color data alone bundles up into a Data Frame just fine:

df_colors = pandas.DataFrame({'colors':colors})

But the position information does not:

df_pos = pandas.DataFrame({'pos':pos})

This returns the following unhelpful error message:

ValueError: If using all scalar values, you must must pass an index

And what I really want to do is bundle both position and color information together:

df_agg = pandas.DataFrame({'pos':pos, 'colors':colors})

But this does not work, and returns the following equally cryptic error:

Exception: Data must be 1-dimensional

Surely it is possible to bundle multi-dimensional data with pandas, as well as data with mixed dimension. Does anyone know the API for this behavior?

1 Answer 1

1

The problem is that pos has dimensions of (100,3). To turn it into a column, you need an array of dimensions (100,).

One option is to create an individual column for each of the dimensions:

df_agg = pandas.DataFrame({'posX':pos[:,0], 'posY':pos[:,1], 'posZ':pos[:,2], 'colors':colors})

Another options is to cast each coordinate into a 3-tuple:

posTuple = tuple(map(tuple,pos))
df_aggV2 = pandas.DataFrame({'pos':posTuple, 'colors':colors})
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, Andrew. This works, but I don't understand why. I mean, you are right, it works. But I don't think it can be due to shape change. np.shape(pos) = (100, 3), and np.shape(posTuple) = (100, 3). I'd like to keep the data type as a numpy array, if possible. Why does simply making the data tuple (or also a list: list(map(list,pos)) works just as well) solve the pandas error?
Ok, Andrew, I just confirmed. The reason your solution works is due to the type change. If I simply use list(pos) instead of pos, this solves the problem. Though I still don't understand why.
When you use a dict as your data input for DataFrame, only certain types of values are acceptable. My solutions cast the data into acceptable types. "DataFrame accepts many different kinds of input: ... Dict of 1D ndarrays, lists, dicts, or Series". pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.