0

I have a daraframe that returns data for each OfficeLocation

enter image description here

How can I split dataframe by each OfficeLocation and insert each piece of data into separate excel spreadsheet.

import pandas
import pyodbc

server = 'MyServer'
db = 'MyDB'

myparams = ['2019-01-01','2019-02-28', None]  # None substitutes NULL in sql
connection_string = pyodbc.connect('DRIVER={SQL Server};server='+server+';DATABASE='+ db+';Trusted_Connection=yes;')
df = pandas.read_sql_query('EXEC PythonTest_Align_RSrptAccountCurrentMunich @EffectiveDateFrom=?,@EffectiveDateTo=?,@ProducerLocationID=?', connection_string, params = myparams)

# sort the daraframe
df.sort_values(by=['OfficeLocation'], axis=0,inplace=True)

# set the index to be this and do not drop 
df.set_index(keys=['OfficeLocation'],drop=False,inplace=True)

# get a list of unique offices
office = df['OfficeLocation'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
SanDiego = df.loc['San Diego']
print(SanDiego)

# how can I iterate through each office and create excel file for each office
df.loc['San Diego'].to_excel((r'\\user\name\Python\SanDIego_Office.xlsx'))

So I need 3 excel spreadsheet with data: SanDiego.xlsx, Vista.xlsx and SanBernardino.xlsx

2 Answers 2

3

You can use groupby:

for location, d in df.groupby('OfficeLocation'):
    d.to_excel(f'\\user\name\Python\{location}.xlsx')
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you . Got this now: 'OfficeLocation' is both an index level and a column label, which is ambiguous.
remove df.set_index(keys=['OfficeLocation'],drop=False,inplace=True) line, you don't need it.
Thanks. But it returns me one file {location}.xlsx and the only office there is Vista. Should be Vista and San Diego.
Did you forget the f before the string name inside to_excel?
Oh, my bad. Thanks
|
1

How about something as simple as this?

for loc in df["OfficeLocation"].unique():
    save_df = df[df["OfficeLocation"] == loc]
    save_df.to_excel(loc + ".xlsx")

EDIT

I've generated 50,000 rows of data similar to yours.

+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| Policy Number | ProducerLocationId | OfficeLOcation | EffectiveDate | ExpirationDate | TransactionType | BondAmount | GrossPremium |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| 7563299       | 8160               | Aldora         | 31/10/2018    | 28/01/2019     | Cancelled       | -61081     | -2372.303665 |
| 6754151       | 3122               | Aucilla        | 04/05/2019    | 15/06/2019     | New Business    | -80151     | -4135.443318 |
| 3121128       | 3230               | Aulander       | 11/10/2018    | 29/12/2018     | New Business    | -67563     | -28394.83428 |
| 911463        | 4041               | Aullville      | 30/11/2018    | 20/02/2019     | New Business    | -47918     | -17840.05749 |
| 5068380       | 3794               | Ava            | 10/01/2019    | 28/03/2019     | Cancelled       | -41094     | -30523.0655  |
| 2174424       | 1263               | Alcan Border   | 18/04/2019    | 10/07/2019     | Cancelled       | -73661     | -5979.278874 |
| 475464        | 9250               | Audubon        | 15/01/2019    | 17/02/2019     | New Business    | -85217     | -64988.83987 |
| 2076075       | 7405               | Alderton       | 20/08/2019    | 26/09/2019     | New Business    | -32335     | -11144.63342 |
| 3645387       | 9357               | Austwell       | 22/10/2018    | 19/12/2018     | Cancelled       | -5065      | -5013.982643 |
| 3316361       | 1335               | Aurora         | 29/09/2018    | 24/12/2018     | New Business    | -13939     | -6333.580641 |
| 1404387       | 2656               | Auburn Hills   | 04/07/2019    | 19/09/2019     | Cancelled       | -12049     | -385.3522259 |
| 6908433       | 1288               | Alcester       | 30/10/2018    | 18/01/2019     | Cancelled       | -56902     | -27341.06181 |
| 9908879       | 6012               | Alexandria     | 20/06/2019    | 21/08/2019     | Cancelled       | -76226     | -12671.06376 |
| 7850879       | 4606               | Avery          | 10/11/2018    | 21/01/2019     | Cancelled       | -54297     | -40619.42718 |
| 8437707       | 4149               | Auxvasse       | 22/09/2019    | 28/10/2019     | Cancelled       | -59584     | -19800.71077 |
| 4260681       | 1889               | Auburndale     | 06/07/2019    | 22/08/2019     | New Business    | -55035     | -18271.5442  |
| 7234116       | 2636               | Alexander      | 14/07/2019    | 31/08/2019     | New Business    | -59319     | -15711.2827  |
| 3721467       | 3765               | Alexander City | 16/10/2018    | 23/12/2018     | Cancelled       | -98431     | -26743.07459 |
| 6859964       | 7035               | Alburtis       | 04/11/2018    | 26/12/2018     | New Business    | -36917     | -11339.9049  |
| 2994719       | 6997               | Aleneva        | 09/02/2019    | 13/04/2019     | New Business    | -55739     | -46323.01608 |
| 7542794       | 8968               | Aullville      | 25/09/2018    | 09/11/2018     | Cancelled       | -44488     | -4554.278674 |
| 1340649       | 7003               | Augusta        | 30/11/2018    | 17/02/2019     | New Business    | -78405     | -71910.93325 |
| 8078558       | 7185               | Alderpoint     | 10/06/2019    | 22/07/2019     | New Business    | -37928     | -29289.29545 |
| 8198811       | 8963               | Alden          | 05/07/2019    | 15/08/2019     | Cancelled       | -97648     | -79946.41222 |
| 2510522       | 5714               | Avella         | 03/09/2019    | 02/11/2019     | New Business    | -16452     | -11230.93829 |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+

And created two functions one using my version and the other using the groupby method.

In case any one was wondering they both perform similarly but the groupby method comes out on top with less variance and a 1 second quicker run time.

def loop_save_unique(df):    
    for loc in df["OfficeLOcation"].unique():
        save_df = df[df["OfficeLOcation"] == loc]
        save_df.to_excel("output\\test1\\" + loc + ".xlsx")
​
def loop_save_groupby(df):
    for location, d in df.groupby('OfficeLOcation'):
        d.to_excel(f'output\\test2\\{location}.xlsx')



%timeit loop_save_unique(df)
12.1 s ± 556 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit loop_save_groupby(df)
11.1 s ± 183 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.