I have a very large data.table, which I want to summarise columns by group, where the column names starts with a certain pattern.
The columns I am interested in always have the same format, namely: f<X>_<Y>, m<X>_<Y>, f<X>, m<X>.
This is the list of all possible column names:
ageColsPossible <- c("m0_9", "m10_19", "m20_29", "m30_39", "m40_49", "m50_59", "m60_69",
"f0_9", "f10_19", "f20_29", "f30_39", "f40_49", "f50_59", "f60_69")
if there is not enough data available, my data.table will only have some of these columns. I would like to get a vector with the column names that are available in the data:
> names(myData)
[1] "clientID" "policyID" "startYear" "product" "NOplans" "grp"
[7] "policyid" "personid" "age" "gender" "dependant" "location"
[13] "region" "exposure" "startMonth" "cover_effective_date" "endexposuredate" "fromdate"
[19] "enddate" "planHistSufficiency" "productRank" "claim10month" "claim11month" "claim12month"
[25] "claim9month" "NA20_29" "NA30_39" "NA40_49" "NA50_59" "f0_9"
[31] "f10_19" "f20_29" "f30_39" "f40_49" "f50_59" "f60_69"
[37] "m0_9" "m10_19" "m20_29" "m30_39" "m40_49" "m50_59"
[43] "m60_69" "u0_9" "u10_19" "u20_29" "u30_39" "u40_49"
[49] "u50_59" "u60_69" "uNA"
I know of regrex and was thinking something along the line: regex = "(m|f)(\\d+)_?(\\d+)?", but i have also seen patern() function somewhere. Unfortunately i can no longer find it.
any ideas?
.SDcolsaccapetspatterns(), so you can select columns for.SDusing a regex.grep("^[mf]\\d+(?:_\\d+)?$", names(myData), value=TRUE)?