0

I've written out code that will concat multiple Excel sheets from a single file into a DataFrame. Then, by using a function, the DF will be split by 1mil rows into mini DFs. Lastly, the mini DFs are converted into CSVs. How can this function work so that it automatically counts how many DFs per 1mil rows to create? Here is my code so far:

data_df = pd.concat(pd.read_excel('file location', sheet_name = None), ignore_index = True)
def automate_excel(): 
    df1 = pd.DataFrame(data_df[0:1000000]) 
    df2 = pd.DataFrame(data_df[1000001:2000000])
    df1.to_csv('data_df1', index = False) 
    df2.to_csv('data_df2', index = False)

automate_excel()

This is a minimized example as the current file is 5mil rows but I can have some up to 10mil rows or broken out into 10 CSV files

1 Answer 1

1
small_df_rows = 1000000


def split_df(n, idx):
    small_df = df.loc[n:n+small_df_rows-1]
    small_df.to_csv(f'data_df{idx+1}.csv', index=False)


for idx, n in enumerate(range(0, len(df), small_df_rows)):
    split_df(n, idx)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.