3

I have data that contains fertility rates for different countries and I'd like to: 1. rename columns 2. Print out only specific countries (not using index but names)

Here I import data from website

df = pd.read_html('https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html')

Then I try to rename columns (from '0' to 'Country' and from '1' to 'TFR'):

df= df.rename(index=str, columns ={'0':'Country', '1':'TFR'})

But I get error message:

df = df.rename(index=str, columns ={'0':'Country', '1':'TFR'})
AttributeError: 'list' object has no attribute 'rename'

This is the way in which I try to look for specific country:

print(df[df['0'].str.contains("Tanzan")])

And I get following error:

TypeError: list indices must be integers or slices, not str

What am I doing wrong? How to sort it out (if it is possible)? Thank you for your help!

1
  • I can't read the HTML - I get an encoding error. But for the columns just do fertilityRateByCountry.columns = ['Country', 'TFR'] Commented Oct 10, 2018 at 10:26

1 Answer 1

4

First add parameter header=0 for convert first row of page to header of DataFrame and then add [0] for select first DataFrame from list of DataFrames:

url = 'https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html'
d = {'TOTAL FERTILITY RATE(CHILDREN BORN/WOMAN)':'TFR'}
df = pd.read_html(url, header=0)[0].rename(columns=d)
print (df.head())
          Country                                   TFR
0     Afghanistan  5.12 children born/woman (2017 est.)
1         Albania  1.51 children born/woman (2017 est.)
2         Algeria   2.7 children born/woman (2017 est.)
3  American Samoa  2.68 children born/woman (2017 est.)
4         Andorra   1.4 children born/woman (2017 est.)

Last filter by new column name:

print(df[df['Country'].str.contains("Tanzan")])
      Country                                   TFR
204  Tanzania  4.77 children born/woman (2017 est.)
Sign up to request clarification or add additional context in comments.

5 Comments

This is awesome , df = pd.read_html(input('Please Enter the urlname: '), header=0, flavor='bs4')[0]
@pygo - then rather df = pd.read_html(input('Please Enter the urlname: '), header=0, flavor='bs4')[0], because select first DataFrame form list ;)
@jezrael: Thank you for help. I have came across another issue and thought that you might know what to do: how to access TFR column values so I can remove 'children born/woman (2017 est.)'?
@ugabuga77 - You can use df['TFR'] = pd.to_numeric(df['TFR'].str[:-31]), because there is also one value 'children born/woman (2015 est.)
@jezrael: I managed to do it in a different way but yours is more elegant and efficient. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.