1

I'm using the code below to scrape the latest daily prices for a number of funds:

import requests
import pandas as pd

urls = ['https://markets.ft.com/data/funds/tearsheet/historical?s=LU0526609390:EUR', 'https://markets.ft.com/data/funds/tearsheet/historical?s=IE00BHBX0Z19:EUR', 
'https://markets.ft.com/data/funds/tearsheet/historical?s=LU1076093779:EUR']

def format_date(date):
    date = date.split(',')[-2][1:] + date.split(',')[-1]

    return pd.Series({'Date': date})

for url in urls:
    ISIN = url.split('=')[-1].replace(':', '_')
    ISIN = ISIN[:-4]
    ISIN = ISIN + ".OTHER"
    html = requests.get(url).content
    df_list = pd.read_html(html)
    df = df_list[-1]
    df['Date'] = df['Date'].apply(format_date)
    del df['Open']
    del df['High']
    del df['Low']
    del df['Volume']
    df = df.rename(columns={'Close': 'last_traded_price'})
    df = df.rename(columns={'Date': 'last_traded_on'})
    df.insert(2, "id", ISIN)
    df=df.head(1)
    print (df)
df.to_csv(r'/Users/.../Testdata.csv', index=False)

At the moment, the Testdata.csv file is being overwritten everytime a new loop starts and I would like to find a way to save all of the data into the .csv file with this format:

Col 1            Col 2                Col 3
last_traded_on   last_traded_price    id
Oct 07 2021      78.83                LU0526609390.OTHER
Oct 07 2021      11.1                 IE00BHBX0Z19.OTHER
Oct 07 2021      155.56               LU1076093779.OTHER

I need to find a way to somehow save the data to the .csv file outside of the loop but I'm really struggling to find a way to do it.

Thank you

1 Answer 1

2

Use a file handler:

with open(r'/Users/.../Testdata.csv', 'w') as csvfile
    # Here, you need to write headers:
    # csvfile.write("header1,header2,header3\n")
    for url in urls:
        ISIN = url.split('=')[-1].replace(':', '_')
        ...  # The rest of your code
        df.to_csv(csvfile, index=False, header=False)

Or the best practice is to collect each dataframe in a list and use pd.concat to merge all of them and save to a file:

dfs = [] 
for url in urls:
    ISIN = url.split('=')[-1].replace(':', '_')
    ...  # The rest of your code
    dfs.append(df)

pd.concat(dfs).to_csv(r'/Users/.../Testdata.csv', index=False)

Note: your output looks like to be an output of df.to_string() rather than df.to_csv

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.