158

I have DataFrame with column Sales.

How can I split it into 2 based on Sales value?

First DataFrame will have data with 'Sales' < s and second with 'Sales' >= s

0

5 Answers 5

167

You can use boolean indexing:

df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})
print (df)
   A  Sales
0  3     10
1  4     20
2  7     30
3  6     40
4  1     50

s = 30

df1 = df[df['Sales'] >= s]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

df2 = df[df['Sales'] < s]
print (df2)
   A  Sales
0  3     10
1  4     20

It's also possible to invert mask by ~:

mask = df['Sales'] >= s
df1 = df[mask]
df2 = df[~mask]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

print (df2)
   A  Sales
0  3     10
1  4     20

print (mask)
0    False
1    False
2     True
3     True
4     True
Name: Sales, dtype: bool

print (~mask)
0     True
1     True
2    False
3    False
4    False
Name: Sales, dtype: bool
Sign up to request clarification or add additional context in comments.

9 Comments

Is there a way to do it without having to slice the dataframe twice? Because this way we'll have to roll over index onve to create df1, and another time for the exact same condition for df2. But I can't figure out how to get both dataframes in a single line..
Unfortunately I think there is only this solution - see cookbook.
whats the performance difference between using mask vs traditional slicing? My tests show mask is a bit faster, but not a huge difference
Not exactly but I figured it out by doing a for loop; iterating through each unique column value, then splitting the df by the value by slicing it. Not too hard actually, I don't even know why I asked. Thanks though.
The reset_index method might be useful here since otherwise the indices of the two new dataframes are different, and you can't e.g. do df1.Sales - df2.Sales.
|
83

Using groupby you could split into two dataframes like

In [1047]: df1, df2 = [x for _, x in df.groupby(df['Sales'] < 30)]

In [1048]: df1
Out[1048]:
   A  Sales
2  7     30
3  6     40
4  1     50

In [1049]: df2
Out[1049]:
   A  Sales
0  3     10
1  4     20

2 Comments

This operation appears to be substantially more expensive than jezrael's two options, though syntactically more elegant imo
This is not fully equivalent to jezrael's options. If after the split one of the data sets is empty then group by will return list with just one element and it will fail to unpack into df1 and df2.
71

Using groupby and list comprehension:

Storing all the split dataframe in list variable and accessing each of the seprated dataframe by their index.

DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1"],'pos':[10,20,30,40,50],})
ans = [y for x, y in DF.groupby('chr')]

accessing the separated DF like this:

ans[0]
ans[1]
ans[len(ans)-1] # this is the last separated DF

accessing the column value of the separated DF like this:

ansI_chr=ans[i].chr 

4 Comments

This is a great answer!
I think you can simplify to ans = [y for x, y in DF.groupby('chr', as_index=False)] since y is already a DataFrame
This answer doesn't depend on the amount of splits. it should be voted #1 Just need to update it according to @C8H10N4O2 's comment
I believe as_index=False only affects aggregation, so you can simplify further to ans = [y for x, y in DF.groupby('chr')]
18

One-liner using the walrus operator (Python 3.8):

df1, df2 = df[(mask:=df['Sales'] >= 30)], df[~mask]

Consider using copy to avoid SettingWithCopyWarning:

df1, df2 = df[(mask:=df['Sales'] >= 30)].copy(), df[~mask].copy()

Alternatively, you can use the method query:

df1, df2 = df.query('Sales >= 30').copy(), df.query('Sales < 30').copy()

1 Comment

Honestly, I find this more readable, haha
15

I like to use this for speeding up searches or rolling average finds .apply(lambda x...) type functions so I split big files into dictionaries of dataframes:

df_dict = {sale_v: df[df['Sales'] == sale_v] for sale_v in df.Sales.unique()}

This should do it if you wanted to go based on categorical groups.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.