4

I'm wondering how to get parsed tables from panda into a single CSV, I have managed to get each table into a separate CSV for each one, but would like them all on one CSV. This is my current code to get multiple CSVs:

import pandas as pd
import csv

url = "https://fasttrack.grv.org.au/RaceField/ViewRaces/228697009? 
raceId=318809897"

data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )

for i, datas in enumerate(data):

    datas.to_csv("new{}.csv".format(i), header = False, index = False)
2
  • Is the schema for all tables same? Commented May 9, 2018 at 3:48
  • yes the schema is the same Commented May 9, 2018 at 4:33

3 Answers 3

4

I think need concat only, because data is list of DataFrames:

df = pd.concat(data, ignore_index=True)
df.to_csv(file, header=False, index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

You can use axis=1 in concat to put the dataframes side-by-side instead of one after the other (not sure which one you want).
3

You have 2 options:

  1. You can tell pandas to append data while writing to the CSV file.

    data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
    for datas in data:
        datas.to_csv("new.csv", header=False, index=False, mode='a')
    
  2. Merge all the tables into one DataFrame and then write that into the CSV file.

    data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
    df = pd.concat(data, ignore_index=True)
    df.to_csv("new.csv", header=False, index=False)
    

Edit

To still separate the dataframes on the csv file, we shall have to stick with option #1 but with a few additions

data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
with open('new.csv', 'a') as csv_stream:
    for datas in data:
        datas.to_csv(csv_stream, header=False, index=False)
        csv_stream.write('\n')

1 Comment

Thankyou! Would you know how to somehow still seperate the tables during the concat? So they aren't straight after one another? Like have one row of space between
0
all_dfs = []

for i, datas in enumerate(data):
    all_dfs.append(datas.to_csv("new{}.csv".format(i), header = False, index = False))

result = pd.concat(all_dfs)

2 Comments

This can be a one-liner with list comprehension, but I chose the form above for clarity.
Thanks for your reply, I'm getting an error with that code ValueError: All objects passed were None

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.