Reordering values within row R dataframe using regex

Question

I have a paired data value dataframe like this

> df <- data.frame(int1 = c("A", "B", "Ci"), int2 = c("Ca", "Cg", "A"), value = c(3,6,2))
> df
  int1 int2 value
1    A   Ca     3
2    B   Cg     6
3   Ci    A     2

I would like to reorder the values in the first two columns rowwise, searching for a regex or using %in%, such that all the all the values matching "C" are in the same column, and the all the other ones are in another column.

I'm trying to get to this:

  C_int other_int value
1    Ca         A     3
2    Cg         B     6
3    Ci         A     2

akrun · Accepted Answer · 2022-08-15 16:52:02Z

2

Here is an option in base R- loop over the rows with apply, MARGIN = 1, and order the values based on the occurrence of 'C' in the elements, and assign back the ordered elements

df[1:2] <- t(apply(df[1:2], 1, function(x) x[order(!grepl("C", x))]))

-output

> df
  int1 int2 value
1   Ca    A     3
2   Cg    B     6
3   Ci    A     2
> str(df)
'data.frame':   3 obs. of  3 variables:
 $ int1 : chr  "Ca" "Cg" "Ci"
 $ int2 : chr  "A" "B" "A"
 $ value: int  3 6 2

edited Aug 15, 2022 at 16:52

answered Aug 15, 2022 at 16:35

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

steve_b Over a year ago

to alternatively use %in%, this works (just to make this more generalisable): df[1:2] <- t(apply(df[1:2], 1, function(x) x[order(!(x %in% c("Ca", "Cg", "Ci")))])). Apologies, couldn't edit answer.

akrun Over a year ago

@steve_b But would that be more narrower than generalized. grepl will find whereever there is 'C' substring

steve_b Over a year ago

Absolutely! re: generalisable, I just meant this question thread on stackoverflow, not the solution itself. I meant to supplement the answer so that if someone (like me) sometimes has a list of specific matches, or sometimes has a substring, they now have examples of approaches suitable for each. I don't think it warrants a separate SO question and answer though as they're such similar tasks.

akrun Over a year ago

@steve_b Your code also works fine for this example. When I read your descriiption in the post, it says such that all the all the values matching "C" are in the same column. Suppose you have element like 'Cf', 'Ce', etc in a different data, then you have to adjust the code in %in%, whereas grepl doesn't. But, having said that, each have its own limitations

akrun Over a year ago

@steve_b that could be the way if efficiency is a criteria.

|

shs · Accepted Answer · 2022-08-15 18:07:56Z

1

An alternative approach using split():

df[1:2] <- df[1:2] %>%
  t() %>%
  split(!str_detect(., "C"))
df
#>   int1 int2 value
#> 1   Ca    A     3
#> 2   Cg    B     6
#> 3   Ci    A     2

^{Created on 2022-08-15 by the reprex package (v2.0.1)}

edited Aug 15, 2022 at 18:07

answered Aug 15, 2022 at 17:30

shs

3,9211 gold badge9 silver badges36 bronze badges

1 Comment

steve_b Over a year ago

Elegant. Sadly, through no fault of the answerer, this doesn't immediately hold up when there is no match for "C". This was my fault in incompletely defining the question, however.

Collectives™ on Stack Overflow

Reordering values within row R dataframe using regex

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related