0

How do I save returned row from dataframe into excel sheet?

Story: Am working with large txt file (1.7M rows), containing postal codes for Canada. I created a dataframe, and extracted values I need into it. One column of the dataframe is the province id (df['PID']). I created a list of the unique values found in that PID column, and am successfully creating the (13) sheets, each named after the unique PID, in a new excel spread sheet.

Problem: Each sheet only contains the headers, and not the values of the row.

I am having trouble writing the matching row to the sheet. Here is my code:

import pandas as pd

# parse text file into dataframe
path = 'the_file.txt'
df = pd.read_csv(path, sep='\t', header=None, names=['ORIG', 'PID','PCODE'], encoding='iso-8859-1')

# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]

# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
prov_ids_string = map(str, prov_ids)

# create new excel file
writer = pd.ExcelWriter('CanData.xlsx', engine='xlsxwriter')

for id in prov_ids_string:
    mydf = df.loc[df.PID==id]
    # NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
    mydf.to_excel(writer, sheet_name=id)

writer.save()

I know where the writing should happen, but I haven't gotten the correct result. How can I write only the rows which have matching PID's to their respective sheets?

Thank you

2
  • What are you getting here: mydf = df.loc[df.PID==id] Sounds like you are just writing an empty dataframe. Commented Jan 12, 2021 at 3:52
  • I am trying to use that line to "loop" through the column PID, and based on that value save the whole row to the corresponding sheet (ie: PID==10, saves to sheet '10'). And you are correct, the dataframe is empty, but the sheets are created Commented Jan 12, 2021 at 17:38

1 Answer 1

2

The following should work:

import pandas as pd
import xlsxwriter
# parse text file into dataframe

# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]

# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
#prov_ids_string = map(str, prov_ids)

# create new excel file
writer = pd.ExcelWriter('./CanData.xlsx', engine='xlsxwriter')

for idx in prov_ids:
    mydf = df.loc[df.PID==idx]
    # NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
    mydf.to_excel(writer, sheet_name=str(idx))

writer.save()

For example data:

df = pd.DataFrame()
df['ORIG'] = ['aaaaaa111111111111111111111',
             'bbbbbb2222222222222222222222']
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
print(df)

In my Sheet 11, I have: enter image description here

Kr.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks antoine! Not sure if my mapping step caused an issue, or I just forgot the xlxs import statement .. d'oh Either way the script now sorts as expected, thnx & have a nice day! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.