How to join Pandas data frames in python

Question

I'm having some issues with joining dataframes in python. The code is supposed to create a dataframe for each US state and put it into a main dataframe.

I think the problem is coming from the column titles. When I print them I get this:

Index([u'0AL', u'0', u'0AZ', u'0', u'0CO', u'0', u'0DE', u'0', u'0GA', u'0',
       u'0IA', u'0', u'0IL', u'0', u'0KS', u'0', u'0LA', u'0', u'0MD', u'0',
       u'0MI', u'0', u'0MO', u'0', u'0MT', u'0', u'0ND', u'0', u'0NH', u'0',
       u'0NM', u'0', u'0NY', u'0', u'0OK', u'0', u'0PA', u'0', u'0SC', u'0',
       u'0TN', u'0', u'0UT', u'0', u'0VT', u'0', u'0WI', u'0', u'0WY', u'0'],
      dtype='object')

When I print the dataframe I get this:

            0AL    0  0AZ  0  0CO    0  0DE    0  0GA    0 ...   0TN    0  \
2017-01-01  6.5  6.4  3.8  5  5.2  2.9  4.5  4.4    5  5.5 ...   2.9  5.4

            0UT    0  0VT    0  0WI    0  0WY    0
2017-01-01  4.8  3.1    4  3.1  5.1  3.9  5.6  4.8

[1 rows x 50 columns]

It seems to be adding "0" everywhere as well as replacing some of the column headers (States). Initially, I was grabbing the States abbreviations from Wikipedia but then thought I would try and input them manually but that didn't help. Thanks in advance.

The purpose of this is to be able to rank the different states based on certain criteria.

Here is the code:

from fredapi import Fred
fred = Fred(api_key="XXXXXXXXXXXXXXXXX") 
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
import datetime
import pandas as pd
import time
import html5lib
import numpy as np

#html= pd.read_html('https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States')
#us_states= html[0][1][2:]

states =['AK','AL','AR','AZ','CA','CO','CT','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH', 'NJ','NM', 'NV','NY','OH','OK','OR','PA','RI','SC', 'SD','TN','TX','UT','VA','VT','WA','WI','WV',"WY"]

start = datetime.datetime(2017,1,1)
end = datetime.datetime(2017,1,1)

figsize(40, 5)

main_df = pd.DataFrame()

for abbv in states:

    df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end))

    if main_df.empty:
        main_df = df

    else:
        main_df = main_df.join(df, lsuffix=abbv)


print main_df

The first dataframe looks like this:

             0
2017-01-01  6.5

What does the first 'df' look like? I would suspect that something in the series is affecting your column names. — TLOwater
– TLOwater, Commented Jul 9, 2017 at 20:34
I know you've accepted my answer but a better solution is to define the columns parameter when you form your first df dataframe. See my answer for implementation. — TLOwater
– TLOwater, Commented Jul 9, 2017 at 21:15

TLOwater · Accepted Answer · 2017-07-09 21:15:52Z

2

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html

The documentation says that you are using a suffix (i.e. something that comes after a word) after column headers where they're overlapping. This doesn't mean the column header is set to abbv - instead you get this behaviour where you get 0NY for example.

If you want to set the column headers to your states set it with df.rename as so:

df = df.rename(columns={'0': abbv})

Then do the join.

EDIT: Alternatively set the header to save a rename as so:

df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end), columns=[abbv])

edited Jul 9, 2017 at 21:15

answered Jul 9, 2017 at 20:45

TLOwater

6383 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to join Pandas data frames in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related