reading dataframe from csv and array problems

Question

The application I use generates data in a dataframe which I need to use upon request.

It looks similar to this.

<class 'pandas.core.frame.DataFrame'>
             E         Gg        gnx2    J chs lwave J_ID
0    27.572025  82.308581    7.078391  3.0   1   [0]    1
1    46.387728  77.029548   58.112338  3.0   1   [0]    1
2    75.007554  82.087407    0.535442  3.0   1   [0]    1

Everything worked perfectly while I didn't try to use dataframes saved in separate files before. Because when I am trying to use the data after loading - I got errors about data types for the columns which contain arrays. (lvawe for example) is an array and when saved in csv the information about data type is lost.

#saving the data to csv
csv_filename = "ladder.csv"
ladder.to_csv(csv_filename)

So when loading a dataframe next time to use the data I can't get access to array elements like it should.

Because as I understand data in this column is loaded like string. After loading the data through load_csv I get this for the data types:

Unnamed: 0      int64
E             float64
Gg            float64
gnx2          float64
J             float64
chs             int64
lwave          object
J_ID            int64
dtype: object

How can I resolve this issue? How can I correctly load the data with the correct data type or maybe explicitly assign a data type to a column after loading?

StonedTensor · Accepted Answer · 2022-11-17 13:27:56Z

0

In the read_csv function, you can manually assign data types to your new columns. Pass in a dictionary of column name --> preferred data type.

data_type_mapping = {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
my_df = pd.read_csv('myfile.csv', dtypes = data_type_mapping)

From pandas documentation:

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

answered Nov 17, 2022 at 13:27

StonedTensor

6166 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

twistfire Over a year ago

Hi, thanks for your reply. It seems correct, but I don't know how to apply it in my case. The code I have tried is listed below.

twistfire Over a year ago

Tried to use this one, @StonedTensor, data_type_mapping = {... 'lwave': np.ndarray, 'J_ID': np.int64 } filename = isotope_name + '_Emin_10_Emax_1000_2022.10.30_ladder.csv' resonance_ladder = pd.read_csv(filename, dtype = data_type_mapping) But I am having error : TypeError: dtype '<class 'numpy.ndarray'>' not understood I don't get which dtype I need to use..

StonedTensor Over a year ago

What is an example of what lwave is supposed to look like? What is its type before you save the dataframe originally?

twistfire Over a year ago

sorry for my bad understanding of stackoverflow :) I don't get how to add code or markup in the comments. Here is an example of the data in lwave column [1.0, 1.0]

twistfire Over a year ago

The data types before saving looks like this: print(resonance_ladder.dtypes) E object ... chs object lwave object J_ID object dtype: object

|

twistfire · Accepted Answer · 2022-11-17 20:10:08Z

0

Question was resolved by the use of json.loads feature.

#modifying the ladder using json

modified = ladder_df.lwave.apply(json.loads)
ladder_df['lwave'] = modified

answered Nov 17, 2022 at 20:10

twistfire

4961 gold badge7 silver badges25 bronze badges

Collectives™ on Stack Overflow

reading dataframe from csv and array problems

2 Answers 2

11 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related