1

In R one can easily add sequence along the two (or even more) condition variables using ave(), like this:

# create a dataframe
dat = data.frame(
    FactorA = c(rep('a1', 10), rep('a2', 10)),
    FactorB = c(rep('b1', 5), rep('b2', 5), rep('b1', 5), rep('b2', 5)),
    DependentVar = rnorm(20)
)

# add ordering given combination of two factors
dat$Order <- ave(dat$DependentVar, dat$FactorA, dat$FactorB,
    FUN=seq_along)

What would be an analogue in Python with pandas?


Addition on 22/06/2020:

Also, if you would make the levels of FactorA and FactorB interleave by "shuffling" them, like this, for example:

# a slightly "shuffled" dataframe
dat2 = data.frame(
    FactorA = c(rep('a1', 6), rep('a2', 6),
                rep('a1', 4), rep('a2', 4)),
    FactorB = c(rep('b1', 3), rep('b2', 3), rep('b1', 3), rep('b2', 3),
                rep('b1', 2), rep('b2', 2), rep('b1', 2), rep('b2', 2)),
    DependentVar = rnorm(20)
)

ave() would continue to sequence them along:

dat2$Order <- ave(dat2$DependentVar, dat2$FactorA, dat2$FactorB,
    FUN=seq_along)
dat2
   FactorA FactorB DependentVar Order
1       a1      b1    1.3814360     1
2       a1      b1    1.0702582     2
3       a1      b1   -1.1974390     3
4       a1      b2   -1.1687711     1
5       a1      b2   -0.7584645     2
6       a1      b2   -0.5541912     3
7       a2      b1   -0.3083331     1
8       a2      b1    0.7707984     2
9       a2      b1    2.4709730     3
10      a2      b2    0.1768273     1
11      a2      b2    0.5687605     2
12      a2      b2    0.7360105     3
13      a1      b1    0.9253223     4
14      a1      b1   -0.3190011     5
15      a1      b2   -0.2657454     4
16      a1      b2   -0.1617810     5
17      a2      b1    0.9634501     4
18      a2      b1   -0.6749173     5
19      a2      b2    0.8138765     4
20      a2      b2   -1.1075720     5

Can Python (1) mark the "appearance" of the combination and, also, (2) reset the sequencing, like this:

   FactorA FactorB DependentVar Order OrderReset WhichAppearance
1       a1      b1    1.3814360     1          1               1
2       a1      b1    1.0702582     2          2               1
3       a1      b1   -1.1974390     3          3               1
4       a1      b2   -1.1687711     1          1               1
5       a1      b2   -0.7584645     2          2               1
6       a1      b2   -0.5541912     3          3               1
7       a2      b1   -0.3083331     1          1               1
8       a2      b1    0.7707984     2          2               1
9       a2      b1    2.4709730     3          3               1
10      a2      b2    0.1768273     1          1               1
11      a2      b2    0.5687605     2          2               1
12      a2      b2    0.7360105     3          3               1
13      a1      b1    0.9253223     4          1               2
14      a1      b1   -0.3190011     5          2               2
15      a1      b2   -0.2657454     4          1               2
16      a1      b2   -0.1617810     5          2               2
17      a2      b1    0.9634501     4          1               2
18      a2      b1   -0.6749173     5          2               2
19      a2      b2    0.8138765     4          1               2
20      a2      b2   -1.1075720     5          2               2

1 Answer 1

1

In Python with pandas, you can do this:

df['Order'] = df_data.groupby(['FactorA', 'FactorB']).cumcount() + 1

MVCE:

import pandas as pd
from io import StringIO
dat_text = StringIO("""   FactorA  FactorB  DependentVar
1       a1      b1   -1.1435908
2       a1      b1   -0.5799404
3       a1      b1    0.0680380
4       a1      b1    0.1143230
5       a1      b1    0.7673287
6       a1      b2    1.4769585
7       a1      b2   -1.3399984
8       a1      b2   -0.4832071
9       a1      b2   -2.3764355
10      a1      b2    0.2668480
11      a2      b1   -0.7376859
12      a2      b1   -0.4141878
13      a2      b1   -0.5159797
14      a2      b1   -1.3888258
15      a2      b1    0.1497270
16      a2      b2    0.1803052
17      a2      b2    0.8547880
18      a2      b2    0.2372080
19      a2      b2    0.3139455
20      a2      b2    0.7266356""")

df_data = pd.read_csv(dat_text, sep='\s\s+', engine='python')

print(df_data)

Output:

   FactorA FactorB  DependentVar
1       a1      b1     -1.143591
2       a1      b1     -0.579940
3       a1      b1      0.068038
4       a1      b1      0.114323
5       a1      b1      0.767329
6       a1      b2      1.476958
7       a1      b2     -1.339998
8       a1      b2     -0.483207
9       a1      b2     -2.376435
10      a1      b2      0.266848
11      a2      b1     -0.737686
12      a2      b1     -0.414188
13      a2      b1     -0.515980
14      a2      b1     -1.388826
15      a2      b1      0.149727
16      a2      b2      0.180305
17      a2      b2      0.854788
18      a2      b2      0.237208
19      a2      b2      0.313945
20      a2      b2      0.726636

Use groupby with cumcount:

df_data['Order'] = df_data.groupby(['FactorA', 'FactorB']).cumcount() + 1

print(df_data)

Output:

   FactorA FactorB  DependentVar  Order
1       a1      b1     -1.143591      1
2       a1      b1     -0.579940      2
3       a1      b1      0.068038      3
4       a1      b1      0.114323      4
5       a1      b1      0.767329      5
6       a1      b2      1.476958      1
7       a1      b2     -1.339998      2
8       a1      b2     -0.483207      3
9       a1      b2     -2.376435      4
10      a1      b2      0.266848      5
11      a2      b1     -0.737686      1
12      a2      b1     -0.414188      2
13      a2      b1     -0.515980      3
14      a2      b1     -1.388826      4
15      a2      b1      0.149727      5
16      a2      b2      0.180305      1
17      a2      b2      0.854788      2
18      a2      b2      0.237208      3
19      a2      b2      0.313945      4
20      a2      b2      0.726636      5

Update to answer "Addition on 22/06/2020":

#Let's create a helper column to define new groups in order of appearance
df['newgroup'] = (df[['FactorA', 'FactorB']] != df[['FactorA', 'FactorB']].shift()).any(axis=1).cumsum()

#Use cumcount to count rows in groups
df['Order Reset'] = df.groupby('newgroup').cumcount() + 1

#Use factorize to count appearances of groups
df['Appearance'] = df.groupby(['FactorA', 'FactorB'])['newgroup'].transform(lambda x: x.factorize()[0]+1)

df

Output:

   FactorA FactorB  DependentVar  Order  newgroup       Order Reset  Appearance
1       a1      b1      1.381436      1         1                 1           1
2       a1      b1      1.070258      2         1                 2           1
3       a1      b1     -1.197439      3         1                 3           1
4       a1      b2     -1.168771      1         2                 1           1
5       a1      b2     -0.758465      2         2                 2           1
6       a1      b2     -0.554191      3         2                 3           1
7       a2      b1     -0.308333      1         3                 1           1
8       a2      b1      0.770798      2         3                 2           1
9       a2      b1      2.470973      3         3                 3           1
10      a2      b2      0.176827      1         4                 1           1
11      a2      b2      0.568761      2         4                 2           1
12      a2      b2      0.736010      3         4                 3           1
13      a1      b1      0.925322      4         5                 1           2
14      a1      b1     -0.319001      5         5                 2           2
15      a1      b2     -0.265745      4         6                 1           2
16      a1      b2     -0.161781      5         6                 2           2
17      a2      b1      0.963450      4         7                 1           2
18      a2      b1     -0.674917      5         7                 2           2
19      a2      b2      0.813877      4         8                 1           2
20      a2      b2     -1.107572      5         8                 2           2
Sign up to request clarification or add additional context in comments.

7 Comments

I'm curious: how would you restart cumcount() in the case when two-factor levels would not be consecutive? Currently, the code continues to count consistently. Tnx so much!
@striatum I don't understand your two-factor levels questions, can you create a sample dataset that includes this aspect of your curiousity?
Unfortunately, replying here doesn't give all formatting capabilities. However, imagine exactly the same example like yours, but this time rows 4 and 5 come after the row 10. df_data.groupby().cumcount() would just continue to count, as it should. But how would you reset such that new rows 11 and 12 (former 4 and 5) would not get values for ['Order'] 4 and 5 but, restarted 1 and 2. I hope this is clear.
@striatum modify the question It would be easier.
@Scott Boston, I did it. Thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.