Pandas import a multiindex csv with index levels on the same column

Question

I have a multiindex csv with the following format:

 ; ;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017
CO2;;;;;;;;;;;;;;;;;;;
010000 Agriculture and horticulture;AZZ;2312;2249;2165;2102;2034;2095;2106;2067;2060;1935;1985;1983;1893;1865;1750;1728;1777;1736
020000 Forestry;AZZ;40;42;39;43;46;50;49;49;46;52;62;62;67;60;63;66;67;66
030000 Fishing;AZZ;785;767;746;722;645;655;629;580;501;485;472;441;351;384;352;382;387;377
 ; ;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017
More CO2;;;;;;;;;;;;;;;;;;;
010000 Agriculture and horticulture;AZZ;2312;2249;2165;2102;2034;2095;2106;2067;2060;1935;1985;1983;1893;1865;1750;1728;1777;1736
020000 Forestry;AZZ;40;42;39;43;46;50;49;49;46;52;62;62;67;60;63;66;67;66
030000 Fishing;AZZ;785;767;746;722;645;655;629;580;501;485;472;441;351;384;352;382;387;377

So both levels of the MultiIndex are actually on the same column.

I am trying to import it as follows:

df=pd.read_csv('my.csv',sep=";",header=[0],index_col=[0])

But this returns the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 24: invalid start byte

I am not sure where position 24 is referring to and how to proceed to import the file.

Here is a link to the file: https://wetransfer.com/downloads/338c3aa2ef68052b45d29c509d5bf82120191009073413/88bc558e72adc48e8683d8af2792d51d20191009073413/81d59b

Desired Output

                                                        2000    2001    2002    2003    ...

CO2         010000 Agriculture and horticulture   AZZ  2312.0  2249.0  2165.0  2102.0   ...
            020000 Forestry                       AZZ    40.0    42.0    39.0    43.0   ...
            030000 Fishing                        AZZ   785.0   767.0   746.0   722.0   ... 
            060000 Extraction of oil and gas      BZ1  2174.0  2190.0  2184.0  2188.0   ... 
            080090 Extraction of gravel and stone BZ2   295.0   332.0   304.0   277.0   ...

                                                       2000    2001    2002    2003     ...

More CO2    010000 Agriculture and horticulture   AZZ  2312.0  2249.0  2165.0  2102.0   ...
            020000 Forestry                       AZZ    40.0    42.0    39.0    43.0   ...
            030000 Fishing                        AZZ   785.0   767.0   746.0   722.0   ... 
            060000 Extraction of oil and gas      BZ1  2174.0  2190.0  2184.0  2188.0   ... 
            080090 Extraction of gravel and stone BZ2   295.0   332.0   304.0   277.0   ...

Not easy debug file from text, is possible upload your file (few rows but with error) to gdocs, dropbox, wetransfer or similar and share link? — jezrael
– jezrael, Commented Oct 9, 2019 at 7:27

Jary · Accepted Answer · 2019-10-09 07:50:04Z

2

you can encoding gbk to read

df=pd.read_csv('./AirEmissions117.csv',sep=';',encoding='gbk')

answered Oct 9, 2019 at 7:50

Jary

794 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-10-09 08:36:25Z

1

For me working set encoding and then is necessary some processing:

df = pd.read_csv('AirEmissions117.csv',
                 sep=";",
                 encoding = "ISO-8859-1",
                 )

#check if last 5 columns contains only NaN
m = df.iloc[:, -5:].isna().all(1)
#create new column in first position by types
df.insert(0, 'type', df.iloc[:, 0].where(m).ffill())
#remove NaNs rows and create MultiIndex
df = df[~m].set_index(df.columns[:3].tolist())

edited Oct 9, 2019 at 8:36

answered Oct 9, 2019 at 7:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

CAPSLOCK Over a year ago

Hi, it imports it but the multiindex problem is unsolved. The row you skip is actually the first level of the multiindex. The issue is that both the first level and the second level are on the same column [0]

Collectives™ on Stack Overflow

Pandas import a multiindex csv with index levels on the same column

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related