0

Is there any way to select values within 5 certain ranges for a given column, and to each different dataframe, apply in a new column, a label?

I mean, I have a list a of dataframes. All dataframes have 2 columns and share the same first column, but differs in the second (header and values). For example:

>> df1
   GeneID   A
     1     0.3 
     2     0.0
     3     143
     4      9
     5     0.6

>> df2
   GeneID   B
     1     0.2 
     2     0.3
     3     0.1
     4     0.7
     5     0.4

  ....

I would like to:

  1. For each dataframe on the list, perform a calculation which gives the probability of that value occur within 1 of 5 different range. Append a new column with those values;

  2. For each dataframe on the list, attach the respective range label in another new column.

Where the ranges are:

*Range_Values* -> *Range_Label*

   **[0]**     ->   'l1'

  **]0,1]**    ->   'l2'

 **]1,10]**    ->   'l3'

**]10,100]**   ->   'l4'

  **>100**        'l5'

This 2 steps approaches would led to something like:

>> list_dfs[df1]
   GeneID    A    Prob_val     Exp_prof
      1     0.3     0.4         'l2'
      2     0.0     0.2         'l1'
      3     143     0.2         'l5'
      4      9      0.2         'l3'
      5     0.6     0.4         'l2'

2 Answers 2

1

You have to first define the bins and labels -

bins = [0, 1, 10, 100, float("inf")]
labels = ['l1', 'l2', 'l3', 'l4', 'l5']

Then use pd.cut() -

pd.cut(df1['A'], bins, right=False)

There is a labels parameter in pd.cut() that you can use to get labels -

pd.cut(df1['A'], bins, labels=labels, right=False)

You can use the bins generated to compute probabilities I leave it upto you to do that.

You can do this for the rest of the dfs in a loop and finally assign them to a list -

list_dfs = [df1, df2, ...]

If you have dynamic number of dfs use a loop -

Framework

for df in dfs:
    df['bins'] = pd.cut(df['A'], bins, right=False)
    df['label'] = pd.cut(df['A'], bins, labels=labels, right=False)
Sign up to request clarification or add additional context in comments.

3 Comments

Although it is a good answer, someting must be wrong with the labels. The labels are not working fine, it does not match the proper range.
@JoãoFernandes you were right. I have updated the ans to include l6. The reason being I decided to include the float("inf") to capture values greater than 100
Following your codethe label l1 will be given to the bin [0,1] since the singleton [0] won't be taken in to account. You could consider adding a new category for this in order to match the desired mapping.
1

For the labels and bins, you can use pandas.cut. Note that you can't use a singleton as a bin in this function. Therefore you will have to create it afterwards. Here is how you can do this.

First I recreate one of your dataframes:

    import io
temp = u"""
GeneID    A
      1     0.3
      2     0.0
      3     143
      4      9
      5     0.6"""
foo = pd.read_csv(io.StringIO(temp),delim_whitespace = True)

Then I create the new column and fill the NaN values with the label l1 which corresponds to the singleton [0].

foo['Exp_prof'] = pd.cut(foo.A,bins = [0,1,10,100,np.inf],labels = ['l2','l3','l4','l5'])
foo['Exp_prof'] = foo['Exp_prof'].cat.add_categories(['l1'])
foo['Exp_prof'] = foo['Exp_prof'].fillna('l1')

And I use this new column to compute the probabilities:

foo['Prob_val'] = foo.Exp_prof.map((foo.Exp_prof.value_counts()/len(foo)).to_dict())

And the output is:

    GeneID  A       Exp_prof    Prob_val
0   1       0.3     l2          0.4
1   2       0.0     l1          0.2
2   3       143.0   l5          0.2
3   4       9.0     l3          0.2
4   5       0.6     l2          0.4

2 Comments

Thar works just fine, thank you ! The probability calculi is based on the A column range or in the frequency of labels just added as well. See, in this case, the label l2 as a prob value of 0.4 since it is 2/5
I edited my answer to add the computation of these probabilities.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.