0

We have a string column in our database with values for sports teams. The names of these teams are occasionally prefixed with the team's ranking, like such: (13) Miami (FL). Here the 13 is Miami's rank, and the (FL) means this is Miami Florida, not Miami of Ohio (Miami (OH)):

We need to clean up this string, removing (13) and keeping only Miami (FL). So far we've used gsub and tried the following:

> gsub("\\s*\\([^\\)]+\\)", "", "(13) Miami (FL)")
[1] " Miami"

This is incorrectly removing the (FL) suffix, and it's also not handling the white space correctly in front.

Edit

Here's a few additional school names, to show a bit the data we're working with. Note that not every school has the (##) prefix.:

c("North Texas", "Southern Methodist", "Texas-El Paso", 
  "Brigham Young", "Winner", "(12) Miami (FL)", "Appalachian State", 
  "Arkansas State", "Army", "(1) Clemson", 
  "(14) Georgia Southern")

3 Answers 3

1

You can use sub to remove a number in brackets followed by whitespace.

sub("\\(\\d+\\)\\s", "", "(13) Miami (FL)")
#[1] "Miami (FL)"

The regex could be made stricter based on the pattern in data.

Sign up to request clarification or add additional context in comments.

Comments

1

We can match the opening ( followed by one or more digits (\\d+), then the closing )) and one or more spaces (\\s+), replace with blanks ("")

sub("\\(\\d+\\)\\s+", "",  "(13) Miami (FL)")
#[1] "Miami (FL)"

Using the OP' updated example

sub("\\(\\d+\\)\\s+", "",  v1)
#[1] "North Texas"        "Southern Methodist" "Texas-El Paso"      "Brigham Young"      "Winner"             "Miami (FL)"        
#[7] "Appalachian State"  "Arkansas State"     "Army"               "Clemson"            "Georgia Southern"  

Or another option with str_remove from stringr

library(stringr)
str_remove("(13) Miami (FL)", "\\(\\d+\\)\\s+")

Comments

0

Another solution, based on stringr, is this:

str_extract(v1, "[A-Z].*")
 [1] "North Texas"        "Southern Methodist" "Texas-El Paso"      "Brigham Young"      "Winner"            
 [6] "Miami (FL)"         "Appalachian State"  "Arkansas State"     "Army"               "Clemson"           
[11] "Georgia Southern"

This extracts everything starting from the first upper case letter (thereby ignoring the unwanted rankings).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.