Sort strings based on number in part of string

Question

I have a huge data that I cannot upload here because.

I have two types of columns, their names start with T.H.L or T.H.L.varies..... Both types have are numbered in the format So####, e.g., T.H.L.So1_P1_A2 until T.H.L.So10000_P1_A2.

For each T.H.L column there is a column named T.H.L.varies.... with the same ending.

I want to order the columns by the numbers after So, with first the T.H.L and then the corresponding T.H.L.varies.... version for each So number.

What I tried was to do

library(gtools) 
mySorted<- df2[,mixedorder(colnames(df2))]

Which is close, it sorts them correctly by number, but first all T.H.L and then all T.H.L.varies instead of alternating them.

I have posted the column names to Github:

This is all about column names. Rather sharing any of your data frame rows, just share dput(names(df)), or maybe dput(names(df)[1:100]) if the first is too long. — Gregor Thomas
– Gregor Thomas, Commented Jun 27, 2016 at 20:25
Also try df2 = df[, grep(pattern = "^T\\.H\\.L\\.", x = names(df))]... it might be what you want. If that's not what you want, maybe you can clarify, the sentence "Means I want one T.H.L and one T.H.L.varies which both have the same end" doesn't make sense to me. — Gregor Thomas
– Gregor Thomas, Commented Jun 27, 2016 at 20:30
@Gregor I shared it above on github because I could not put it here — nik
– nik, Commented Jun 27, 2016 at 20:38
@Gregor if you look at the dput of names that I sent, it is shown that I have as many T.H.L as T.H.L.varies, one , now is clear ? — nik
– nik, Commented Jun 27, 2016 at 20:41
Maybe more so. You already have the columns you want, and your only problem is sorting them? Is this correct? And you want them sorted by the number following the letters So in the column name? — Gregor Thomas
– Gregor Thomas, Commented Jun 27, 2016 at 22:43

Gregor Thomas · Accepted Answer · 2016-06-28 06:12:05Z

1

Okay, let's call the names of your data frame (the names you want to reorder) x:

x = names(df2)

# first remove the ones without numbers
# because we want to use the numbers for ordering
no_numbers = c("T.H.L", "T.H.L.varies....")
x = x[! x %in% no_numbers]

# now extract the numbers so we can order them
library(stringr)
x_num = as.numeric(str_extract(string = x, pattern = "(?<=So)[0-9]+"))

# calculate the order first by number, then alphabetically to break ties
ord = order(x_num, x)

# verify it is working
head(c(no_numbers, x[ord]), 10)
#  [1] "T.H.L"                      "T.H.L.varies...."           "T.H.L.So1_P1_A1"           
#  [4] "T.H.L.varies.....So1_P1_A1" "T.H.L.So2_P1_A2"            "T.H.L.varies.....So2_P1_A2"
#  [7] "T.H.L.So3_P1_A3"            "T.H.L.varies.....So3_P1_A3" "T.H.L.So4_P1_A4"           
# [10] "T.H.L.varies.....So4_P1_A4"

# finally, reorder your data frame columns
df2 = df2[, c(no_numbers, x[ord])]

And you should be done.

answered Jun 28, 2016 at 6:12

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Gregor Thomas Over a year ago

Maybe you could share dput(head(names(x), 10))? It works when I read in your github post, made it a character vector and called it x.

nik Over a year ago

dput(head(x, 10))? because dput(head(names(x), 10)) gives me NULL

Gregor Thomas Over a year ago

Yeah, dput(head(names(df2), 10)) maybe

nik Over a year ago

I found where the problem is "(?<=So)[0-9]+" this is not matching with my real characters name The real column names are Ratio.H.L. and Ratio.H.L.variability.... Ratio.H.L.Mo1.... etc

Gregor Thomas Over a year ago

It's regex. [0-9]+ matches one or more digits, (?<=So) says look before those digits for the characters "So", but don't include them in the extract. It works fine on the data I read in from github... as you can see. That output I show is real output based on the input you posted on github,

|

Collectives™ on Stack Overflow

Sort strings based on number in part of string

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related