3

I have the following:

a_aa a_ab a_ac b_aa b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1

Desired outcome:

a_aa b_aa a_ab b_ab a_ac b_ac
2    3    3    1     3    2
3    1    4    3     1    1

Code with data:

d <- "a_aa a_ab a_ac b_aa   b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1"
dd <- read.table(textConnection(object = d), header = T)

My current solution is manual:

    dd %>% select(a_aa, b_aa, a_ab, b_ab, a_ac, b_ac)

however, is onerous when number of columns is large. Any ideas how to do this kind of column ordering with grouping (e.g. sequence a_etc1, b_etc1, a_etc2, b_etc2)? Thank you!

4 Answers 4

6

Here is one way to solve your problem:

dd[order(gsub(".+_", "", names(dd)))]

# or

dd %>%
  select(order(gsub(".+_", "", names(.))))


  a_aa b_aa a_ab b_ab a_ac b_ac
1    2    3    3    1    3    2
2    3    1    4    3    1    1
Sign up to request clarification or add additional context in comments.

1 Comment

This is very elegant, worked on a large set. Thank you!
1

You may do something like this

library(tidyverse)

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  str_split("_") %>% 
  map_chr(~.x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

If you don't want to use other tidyverse libraries than dplyr, you can do

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  strsplit("_") %>% 
  sapply(\(.x) .x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

Created on 2022-07-10 by the reprex package (v2.0.1)

1 Comment

Good to see you here!
1

Here is an alternative way:

We first create a function using two additional packages:

  1. stringi for its str_reverse function (advantage:it is vectorized)
  2. gtools for its mixedsort function

Then we apply it to our dataframe df with rename_with:

library(dplyr)

# the function
my_function <- function(df){
  x <- stringi::stri_reverse(colnames(df))
  y <- gtools::mixedsort(x)
  stringi::stri_reverse(y)
}


df %>% 
  rename_with(., ~my_function(df))
  a_aa b_aa a_ab b_ab a_ac b_ac
1    2    3    3    3    1    2
2    3    4    1    1    3    1

Comments

1

Base R solution:

dd[,order(gsub(".*\\_(\\w+$)", "\\1", names(dd)))]

1 Comment

Excellent, thank you! What is the purpose of "+$" in your regex? This worked as well: dd[,order(gsub(".*\_(\\w)", "\\1", names(dd)))]. Interestingly, in a large dataset with "+" characters in some column names, the sorting was not accomplished when "+$" was used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.