0

I have a DataFrame df that I need to split based on whether the value in a specific column ColB is within a given range;

1-3, 3-5, 5-7 etc

Input:

Time ColA ColB ColC
1    100  1.1  500
2    105  3.2  600
3    107  7.7  550
4    106  2.4  750
5    104  5.2  950
6    103  6.9  450

Desired Output:

Time ColA ColB ColC
1    100  1.1  500
4    106  2.4  750


Time ColA ColB ColC
2    105  3.2  600



Time ColA ColB ColC
3    107  7.7  550
5    104  5.2  950
6    103  6.9  450

Is there a nice way to do this without creating a loop in Python? Also, would it be more efficient to store the output as a list of DataFrames or a Dictionary of Dataframes? I ask as its a fairly large dataset.

2 Answers 2

2

Use pandas.cut

https://pandas.pydata.org/docs/reference/api/pandas.cut.html

ie.

groups = pd.cut(df["ColB"], [1,3,5,7])
[d for _, d in df.groupby(groups)]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is a nice solution. I get a buffer size error when changing out the cut values with pre defined array, but I'll try to figure that one out. Apologies for the simple question!
2

You can try this:

lst = [(1,3), (3,5), (5,7)]
result = [df[df['ColB'].between(a,b)] for a,b in lst]
for i in result:
    print(i, "\n")
    
   Time  ColA  ColB  ColC
0     1   100   1.1   500
3     4   106   2.4   750 

   Time  ColA  ColB  ColC
1     2   105   3.2   600 

   Time  ColA  ColB  ColC
4     5   104   5.2   950
5     6   103   6.9   450 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.