33

Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work. What am I doing wrong?

I have tried this:

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)

print(df.head(5))
1

5 Answers 5

49

You should provide URL to raw content. Try using this:

import pandas as pd

url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))

Output:

               alpha-2           ...            intermediate-region-code
name                             ...                                    
Afghanistan         AF           ...                                 NaN
Åland Islands       AX           ...                                 NaN
Albania             AL           ...                                 NaN
Algeria             DZ           ...                                 NaN
American Samoa      AS           ...                                 NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Or, you can simply add "?raw=true" at the end of the GitHub URL. You can see my answer below to view how the code looks.
Doing this using a gitlab public repo, I got an HTTPerror (HTTP Error 403: Forbidden). Is there a way to do the same with raw links on gitlab ?
33

Add ?raw=true at the end of the GitHub URL to get the raw file link.

In your case,

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
print(df.head(5))

Output:

               alpha-2 alpha-3  country-code     iso_3166-2   region  \
name                                                                   
Afghanistan         AF     AFG             4  ISO 3166-2:AF     Asia   
Åland Islands       AX     ALA           248  ISO 3166-2:AX   Europe   
Albania             AL     ALB             8  ISO 3166-2:AL   Europe   
Algeria             DZ     DZA            12  ISO 3166-2:DZ   Africa   
American Samoa      AS     ASM            16  ISO 3166-2:AS  Oceania   

                     sub-region intermediate-region  region-code  \
name                                                               
Afghanistan       Southern Asia                 NaN        142.0   
Åland Islands   Northern Europe                 NaN        150.0   
Albania         Southern Europe                 NaN        150.0   
Algeria         Northern Africa                 NaN          2.0   
American Samoa        Polynesia                 NaN          9.0   

                sub-region-code  intermediate-region-code  
name                                                       
Afghanistan                34.0                       NaN  
Åland Islands             154.0                       NaN  
Albania                    39.0                       NaN  
Algeria                    15.0                       NaN  
American Samoa             61.0                       NaN 

Note: This works only with GitHub links and not with GitLab or Bitbucket links.

2 Comments

May I ask why when I read the file and print it, it only shows lines 0-20 then skips to 3000 and goes till the end. Note: there are 5000 lines.
@AirStalk3r You need to provide more information. May be post a new question with details.
9

You can copy/paste the url and change 2 things:

  1. Remove "blob"
  2. Replace github.com by raw.githubusercontent.com

For instance this link:

https://github.com/mwaskom/seaborn-data/blob/master/iris.csv

Works this way:

import pandas as pd

pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

Comments

0

I recommend to either use pandas as you tried to and others here have explained, or depending on the application, the python csv-handler CommaSeperatedPython, which is a minimalistic wrapper for the native csv-library.

The library returns the contents of a file as a 2-Dimensional String-Array. It's is in its very early stage though, so if you want to do large scale data-analysis, I would suggest Pandas.

Comments

0

First convert the github csv file to raw in order to access the data, follow the link below in comment on how to convert csv file to raw .

import pandas as pd

url_data = (r'https://raw.githubusercontent.com/oderofrancis/rona/main/Countries-Continents.csv')

data_csv = pd.read_csv(url_data)

data_csv.head()

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.