0

My csv file has weather information by city. One row has so many columns (over 1200) by time. For example it looks like,

id  city_name  dt_0        temp_0  hum_0  dt_1        temp_1  hum_1  dt_2        temp_2  hum_2
1   Boston     2017110306  23.5    54.0   2017110310  21.4    40.0   2017110314  22.2    52.1
2   Seattle    2017110306  20.4    60.0   2017110310  18.4    42.0   2017110314  18.3    50.5

The schema is not useful for me. So I want to convert it by python Pandas DataFrame. What I want is for it to look like,

id  city_name  dt          temp  hum
1   Boston     2017110306  23.5  54.0
1   Boston     2017110310  21.4  40.0
1   Boston     2017110314  22.2  52.1
2   Seattle    2017110306  20.4  60.0
2   Seattle    2017110310  18.4  42.0
2   Seattle    2017110314  18.3  50.5

How to do it ?

1 Answer 1

3

First set_index, then create MultiIndex with split and last reshape by stack:

df = df.set_index(['id','city_name'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=2, drop=True).reset_index()
print (df)
   id city_name          dt   hum  temp
0   1    Boston  2017110306  54.0  23.5
1   1    Boston  2017110310  40.0  21.4
2   1    Boston  2017110314  52.1  22.2
3   2   Seattle  2017110306  60.0  20.4
4   2   Seattle  2017110310  42.0  18.4
5   2   Seattle  2017110314  50.5  18.3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.