2

I'm having some issues with joining dataframes in python. The code is supposed to create a dataframe for each US state and put it into a main dataframe.

I think the problem is coming from the column titles. When I print them I get this:

Index([u'0AL', u'0', u'0AZ', u'0', u'0CO', u'0', u'0DE', u'0', u'0GA', u'0',
       u'0IA', u'0', u'0IL', u'0', u'0KS', u'0', u'0LA', u'0', u'0MD', u'0',
       u'0MI', u'0', u'0MO', u'0', u'0MT', u'0', u'0ND', u'0', u'0NH', u'0',
       u'0NM', u'0', u'0NY', u'0', u'0OK', u'0', u'0PA', u'0', u'0SC', u'0',
       u'0TN', u'0', u'0UT', u'0', u'0VT', u'0', u'0WI', u'0', u'0WY', u'0'],
      dtype='object')

When I print the dataframe I get this:

            0AL    0  0AZ  0  0CO    0  0DE    0  0GA    0 ...   0TN    0  \
2017-01-01  6.5  6.4  3.8  5  5.2  2.9  4.5  4.4    5  5.5 ...   2.9  5.4

            0UT    0  0VT    0  0WI    0  0WY    0
2017-01-01  4.8  3.1    4  3.1  5.1  3.9  5.6  4.8

[1 rows x 50 columns]

It seems to be adding "0" everywhere as well as replacing some of the column headers (States). Initially, I was grabbing the States abbreviations from Wikipedia but then thought I would try and input them manually but that didn't help. Thanks in advance.

The purpose of this is to be able to rank the different states based on certain criteria.

Here is the code:

from fredapi import Fred
fred = Fred(api_key="XXXXXXXXXXXXXXXXX") 
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
import datetime
import pandas as pd
import time
import html5lib
import numpy as np

#html= pd.read_html('https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States')
#us_states= html[0][1][2:]

states =['AK','AL','AR','AZ','CA','CO','CT','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH', 'NJ','NM', 'NV','NY','OH','OK','OR','PA','RI','SC', 'SD','TN','TX','UT','VA','VT','WA','WI','WV',"WY"]

start = datetime.datetime(2017,1,1)
end = datetime.datetime(2017,1,1)

figsize(40, 5)

main_df = pd.DataFrame()

for abbv in states:

    df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end))

    if main_df.empty:
        main_df = df

    else:
        main_df = main_df.join(df, lsuffix=abbv)


print main_df

The first dataframe looks like this:

             0
2017-01-01  6.5
3
  • What does the first 'df' look like? I would suspect that something in the series is affecting your column names. Commented Jul 9, 2017 at 20:34
  • @TLOwater See update Commented Jul 9, 2017 at 20:36
  • I know you've accepted my answer but a better solution is to define the columns parameter when you form your first df dataframe. See my answer for implementation. Commented Jul 9, 2017 at 21:15

1 Answer 1

2

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html

The documentation says that you are using a suffix (i.e. something that comes after a word) after column headers where they're overlapping. This doesn't mean the column header is set to abbv - instead you get this behaviour where you get 0NY for example.

If you want to set the column headers to your states set it with df.rename as so:

df = df.rename(columns={'0': abbv})

Then do the join.

EDIT: Alternatively set the header to save a rename as so:

df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end), columns=[abbv])

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.