2

I have a paired data value dataframe like this

> df <- data.frame(int1 = c("A", "B", "Ci"), int2 = c("Ca", "Cg", "A"), value = c(3,6,2))
> df
  int1 int2 value
1    A   Ca     3
2    B   Cg     6
3   Ci    A     2

I would like to reorder the values in the first two columns rowwise, searching for a regex or using %in%, such that all the all the values matching "C" are in the same column, and the all the other ones are in another column.

I'm trying to get to this:

  C_int other_int value
1    Ca         A     3
2    Cg         B     6
3    Ci         A     2

2 Answers 2

2

Here is an option in base R- loop over the rows with apply, MARGIN = 1, and order the values based on the occurrence of 'C' in the elements, and assign back the ordered elements

df[1:2] <- t(apply(df[1:2], 1, function(x) x[order(!grepl("C", x))]))

-output

> df
  int1 int2 value
1   Ca    A     3
2   Cg    B     6
3   Ci    A     2
> str(df)
'data.frame':   3 obs. of  3 variables:
 $ int1 : chr  "Ca" "Cg" "Ci"
 $ int2 : chr  "A" "B" "A"
 $ value: int  3 6 2
Sign up to request clarification or add additional context in comments.

6 Comments

to alternatively use %in%, this works (just to make this more generalisable): df[1:2] <- t(apply(df[1:2], 1, function(x) x[order(!(x %in% c("Ca", "Cg", "Ci")))])). Apologies, couldn't edit answer.
@steve_b But would that be more narrower than generalized. grepl will find whereever there is 'C' substring
Absolutely! re: generalisable, I just meant this question thread on stackoverflow, not the solution itself. I meant to supplement the answer so that if someone (like me) sometimes has a list of specific matches, or sometimes has a substring, they now have examples of approaches suitable for each. I don't think it warrants a separate SO question and answer though as they're such similar tasks.
@steve_b Your code also works fine for this example. When I read your descriiption in the post, it says such that all the all the values matching "C" are in the same column. Suppose you have element like 'Cf', 'Ce', etc in a different data, then you have to adjust the code in %in%, whereas grepl doesn't. But, having said that, each have its own limitations
@steve_b that could be the way if efficiency is a criteria.
|
1

An alternative approach using split():

df[1:2] <- df[1:2] %>%
  t() %>%
  split(!str_detect(., "C"))
df
#>   int1 int2 value
#> 1   Ca    A     3
#> 2   Cg    B     6
#> 3   Ci    A     2

Created on 2022-08-15 by the reprex package (v2.0.1)

1 Comment

Elegant. Sadly, through no fault of the answerer, this doesn't immediately hold up when there is no match for "C". This was my fault in incompletely defining the question, however.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.