How to iterate through dataframe using df.loc and key column

Question

I have a daraframe that returns data for each OfficeLocation

How can I split dataframe by each OfficeLocation and insert each piece of data into separate excel spreadsheet.

import pandas
import pyodbc

server = 'MyServer'
db = 'MyDB'

myparams = ['2019-01-01','2019-02-28', None]  # None substitutes NULL in sql
connection_string = pyodbc.connect('DRIVER={SQL Server};server='+server+';DATABASE='+ db+';Trusted_Connection=yes;')
df = pandas.read_sql_query('EXEC PythonTest_Align_RSrptAccountCurrentMunich @EffectiveDateFrom=?,@EffectiveDateTo=?,@ProducerLocationID=?', connection_string, params = myparams)

# sort the daraframe
df.sort_values(by=['OfficeLocation'], axis=0,inplace=True)

# set the index to be this and do not drop 
df.set_index(keys=['OfficeLocation'],drop=False,inplace=True)

# get a list of unique offices
office = df['OfficeLocation'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
SanDiego = df.loc['San Diego']
print(SanDiego)

# how can I iterate through each office and create excel file for each office
df.loc['San Diego'].to_excel((r'\\user\name\Python\SanDIego_Office.xlsx'))

So I need 3 excel spreadsheet with data: SanDiego.xlsx, Vista.xlsx and SanBernardino.xlsx

Quang Hoang · Accepted Answer · 2019-10-22 20:00:45Z

3

You can use groupby:

for location, d in df.groupby('OfficeLocation'):
    d.to_excel(f'\\user\name\Python\{location}.xlsx')

answered Oct 22, 2019 at 20:00

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Serdia Over a year ago

Thank you . Got this now: 'OfficeLocation' is both an index level and a column label, which is ambiguous.

Quang Hoang Over a year ago

remove df.set_index(keys=['OfficeLocation'],drop=False,inplace=True) line, you don't need it.

Serdia Over a year ago

Thanks. But it returns me one file {location}.xlsx and the only office there is Vista. Should be Vista and San Diego.

Quang Hoang Over a year ago

Did you forget the f before the string name inside to_excel?

Serdia Over a year ago

Oh, my bad. Thanks

|

marc_s · Accepted Answer · 2019-10-30 19:35:47Z

How about something as simple as this?

for loc in df["OfficeLocation"].unique():
    save_df = df[df["OfficeLocation"] == loc]
    save_df.to_excel(loc + ".xlsx")

EDIT

I've generated 50,000 rows of data similar to yours.

+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| Policy Number | ProducerLocationId | OfficeLOcation | EffectiveDate | ExpirationDate | TransactionType | BondAmount | GrossPremium |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| 7563299       | 8160               | Aldora         | 31/10/2018    | 28/01/2019     | Cancelled       | -61081     | -2372.303665 |
| 6754151       | 3122               | Aucilla        | 04/05/2019    | 15/06/2019     | New Business    | -80151     | -4135.443318 |
| 3121128       | 3230               | Aulander       | 11/10/2018    | 29/12/2018     | New Business    | -67563     | -28394.83428 |
| 911463        | 4041               | Aullville      | 30/11/2018    | 20/02/2019     | New Business    | -47918     | -17840.05749 |
| 5068380       | 3794               | Ava            | 10/01/2019    | 28/03/2019     | Cancelled       | -41094     | -30523.0655  |
| 2174424       | 1263               | Alcan Border   | 18/04/2019    | 10/07/2019     | Cancelled       | -73661     | -5979.278874 |
| 475464        | 9250               | Audubon        | 15/01/2019    | 17/02/2019     | New Business    | -85217     | -64988.83987 |
| 2076075       | 7405               | Alderton       | 20/08/2019    | 26/09/2019     | New Business    | -32335     | -11144.63342 |
| 3645387       | 9357               | Austwell       | 22/10/2018    | 19/12/2018     | Cancelled       | -5065      | -5013.982643 |
| 3316361       | 1335               | Aurora         | 29/09/2018    | 24/12/2018     | New Business    | -13939     | -6333.580641 |
| 1404387       | 2656               | Auburn Hills   | 04/07/2019    | 19/09/2019     | Cancelled       | -12049     | -385.3522259 |
| 6908433       | 1288               | Alcester       | 30/10/2018    | 18/01/2019     | Cancelled       | -56902     | -27341.06181 |
| 9908879       | 6012               | Alexandria     | 20/06/2019    | 21/08/2019     | Cancelled       | -76226     | -12671.06376 |
| 7850879       | 4606               | Avery          | 10/11/2018    | 21/01/2019     | Cancelled       | -54297     | -40619.42718 |
| 8437707       | 4149               | Auxvasse       | 22/09/2019    | 28/10/2019     | Cancelled       | -59584     | -19800.71077 |
| 4260681       | 1889               | Auburndale     | 06/07/2019    | 22/08/2019     | New Business    | -55035     | -18271.5442  |
| 7234116       | 2636               | Alexander      | 14/07/2019    | 31/08/2019     | New Business    | -59319     | -15711.2827  |
| 3721467       | 3765               | Alexander City | 16/10/2018    | 23/12/2018     | Cancelled       | -98431     | -26743.07459 |
| 6859964       | 7035               | Alburtis       | 04/11/2018    | 26/12/2018     | New Business    | -36917     | -11339.9049  |
| 2994719       | 6997               | Aleneva        | 09/02/2019    | 13/04/2019     | New Business    | -55739     | -46323.01608 |
| 7542794       | 8968               | Aullville      | 25/09/2018    | 09/11/2018     | Cancelled       | -44488     | -4554.278674 |
| 1340649       | 7003               | Augusta        | 30/11/2018    | 17/02/2019     | New Business    | -78405     | -71910.93325 |
| 8078558       | 7185               | Alderpoint     | 10/06/2019    | 22/07/2019     | New Business    | -37928     | -29289.29545 |
| 8198811       | 8963               | Alden          | 05/07/2019    | 15/08/2019     | Cancelled       | -97648     | -79946.41222 |
| 2510522       | 5714               | Avella         | 03/09/2019    | 02/11/2019     | New Business    | -16452     | -11230.93829 |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+

And created two functions one using my version and the other using the groupby method.

In case any one was wondering they both perform similarly but the groupby method comes out on top with less variance and a 1 second quicker run time.

def loop_save_unique(df):    
    for loc in df["OfficeLOcation"].unique():
        save_df = df[df["OfficeLOcation"] == loc]
        save_df.to_excel("output\\test1\\" + loc + ".xlsx")

def loop_save_groupby(df):
    for location, d in df.groupby('OfficeLOcation'):
        d.to_excel(f'output\\test2\\{location}.xlsx')



%timeit loop_save_unique(df)
12.1 s ± 556 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit loop_save_groupby(df)
11.1 s ± 183 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Collectives™ on Stack Overflow

How to iterate through dataframe using df.loc and key column

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related