8

I have a large dataframe of transactions which I want to break into two smaller dataframes based on a certain column ("Type"). If "Type" is "S" then add that entire row to the "cust_sell" dataframe, and if "Type" is "P" to the "cust_buy" dataframe. I am using a for loop, but this is only adding the index value to the dataframe. Any help is appreciated!

from win32com.shell import shell, shellcon
import pandas as pd

filename = (shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)) + '\MSRB T-1_test.xlsx'
wb = pd.read_excel(filename, sheet_name='T1-20062017', index_col=0, header=0)
cust_buy = []
cust_sell = []

# Create a list of customer buys and sells separately
for i in wb.index:
    if wb['Type'][i] == 'S':
        cust_sell.append([i])
    elif wb['Type'][i] == 'P':
        cust_buy.append([i])
1
  • Could you give us a sample of your data that you're working with? Commented Jun 23, 2018 at 18:36

3 Answers 3

30

You do not need to write loops. You can do it easily with pandas.

Assuming your dataframe looks like this:

import pandas as pd  

mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]

To create dataframe for S and P types, you can just do this:

cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']

cust_sell output:

  Type  Dummy
0    S      1
1    S      2
2    S      3
5    S      6
7    S      8

cust_buy output:

  Type  Dummy
3    P      4
4    P      5
6    P      7
Sign up to request clarification or add additional context in comments.

Comments

3

Like @trollster said, it is indeed better to create dataframes for cust_sell and cust_buy. But let's understand what is not working with your code. When you do:

for i in wb.index

it means i will take the values of wb.index. And when you print wb.index, you get:

Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

Meaning i will take the values 1,2,3 ... So when you do:

cust_sell.append([i])

what you are doing is adding to the list_cust_sell a list with inside a single element, i (an integer). If you want to add the entire row, you should use:

cust_sell.append(list(wb.loc[i,:]))

You will end up with a list of lists, each one containing a row

Comments

2

Using dict + groupby you can create a dictionary of dataframes. This solution does not require you to manually specify all unique types and is more easily extendable than a manual loop.

Data from @trollster.

res = dict(tuple(mainDf.groupby('Type')))

{'P':   Type  Dummy
      3    P      4
      4    P      5
      6    P      7,
 'S':   Type  Dummy
      0    S      1
      1    S      2
      2    S      3
      5    S      6
      7    S      8}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.