0

I have a huge data that I cannot upload here because.

I have two types of columns, their names start with T.H.L or T.H.L.varies..... Both types have are numbered in the format So####, e.g., T.H.L.So1_P1_A2 until T.H.L.So10000_P1_A2.

For each T.H.L column there is a column named T.H.L.varies.... with the same ending.

I want to order the columns by the numbers after So, with first the T.H.L and then the corresponding T.H.L.varies.... version for each So number.

What I tried was to do

library(gtools) 
mySorted<- df2[,mixedorder(colnames(df2))]

Which is close, it sorts them correctly by number, but first all T.H.L and then all T.H.L.varies instead of alternating them.

I have posted the column names to Github:

7
  • 1
    This is all about column names. Rather sharing any of your data frame rows, just share dput(names(df)), or maybe dput(names(df)[1:100]) if the first is too long. Commented Jun 27, 2016 at 20:25
  • Also try df2 = df[, grep(pattern = "^T\\.H\\.L\\.", x = names(df))]... it might be what you want. If that's not what you want, maybe you can clarify, the sentence "Means I want one T.H.L and one T.H.L.varies which both have the same end" doesn't make sense to me. Commented Jun 27, 2016 at 20:30
  • @Gregor I shared it above on github because I could not put it here Commented Jun 27, 2016 at 20:38
  • @Gregor if you look at the dput of names that I sent, it is shown that I have as many T.H.L as T.H.L.varies, one , now is clear ? Commented Jun 27, 2016 at 20:41
  • Maybe more so. You already have the columns you want, and your only problem is sorting them? Is this correct? And you want them sorted by the number following the letters So in the column name? Commented Jun 27, 2016 at 22:43

1 Answer 1

1

Okay, let's call the names of your data frame (the names you want to reorder) x:

x = names(df2)

# first remove the ones without numbers
# because we want to use the numbers for ordering
no_numbers = c("T.H.L", "T.H.L.varies....")
x = x[! x %in% no_numbers]

# now extract the numbers so we can order them
library(stringr)
x_num = as.numeric(str_extract(string = x, pattern = "(?<=So)[0-9]+"))

# calculate the order first by number, then alphabetically to break ties
ord = order(x_num, x)

# verify it is working
head(c(no_numbers, x[ord]), 10)
#  [1] "T.H.L"                      "T.H.L.varies...."           "T.H.L.So1_P1_A1"           
#  [4] "T.H.L.varies.....So1_P1_A1" "T.H.L.So2_P1_A2"            "T.H.L.varies.....So2_P1_A2"
#  [7] "T.H.L.So3_P1_A3"            "T.H.L.varies.....So3_P1_A3" "T.H.L.So4_P1_A4"           
# [10] "T.H.L.varies.....So4_P1_A4"

# finally, reorder your data frame columns
df2 = df2[, c(no_numbers, x[ord])]

And you should be done.

Sign up to request clarification or add additional context in comments.

8 Comments

Maybe you could share dput(head(names(x), 10))? It works when I read in your github post, made it a character vector and called it x.
dput(head(x, 10))? because dput(head(names(x), 10)) gives me NULL
Yeah, dput(head(names(df2), 10)) maybe
I found where the problem is "(?<=So)[0-9]+" this is not matching with my real characters name The real column names are Ratio.H.L. and Ratio.H.L.variability.... Ratio.H.L.Mo1.... etc
It's regex. [0-9]+ matches one or more digits, (?<=So) says look before those digits for the characters "So", but don't include them in the extract. It works fine on the data I read in from github... as you can see. That output I show is real output based on the input you posted on github,
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.