4

I'd like to create 1 data frame/structure using Pandas from multiple CSV files from URL's, keeping the initial header line.

With a single URL everything works as expected:

df = pd.read_csv('http://www.URL1.csv')

I have attempted the following with multiple URL's:

df = pd.read_csv('http://www.URL1.csv', 'http://www.URL2.csv', ...)

However, when attempting to print for testing, the result is spaced out over thousands of lines and is far from the standard layout. Since I am new to Pandas, it is clear I am doing something wrong.


I'd expect the layout to be as followed:

Header1 Header2 Header3 ...
DATA    DATA    DATA    ...

1 Answer 1

4

I think you need list comprehension with list of urls where output is list of DataFrames. Then use concat for join together:

urls = ['http://www.URL1.csv', 'http://www.URL2.csv']
dfs = [pd.read_csv(url) for url in urls]

df = pd.concat(dfs, ignore_index=True)
Sign up to request clarification or add additional context in comments.

5 Comments

That's amazing! Works like a charm! Is there any chance you could explain the need for list comprehension?
You need list comprehension if need apply some function like read_csv for each element of another list of elemens.
... output of list comprehension is list of outputs (dataframes) which return function (like read_csv). And fastest way for create dataframe from list of dataframes is concat.
Thanks, is there a purpose behind ignore_index=True? I can't seem to spot the difference
You can avoid duplicates in index. Please check docs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.