3

I have a dataframe in r with a column which is a big string. I want to use that string to create a new column with specific values.

This is the sample dataframe:

dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

Now if the column Banner has string containing Watermelon or Vanilla then the new column label should have values only Watermelon or Vanilla else Default. Below is what the expected dataframe should be like.

How can I use grep or anything else to have multiple conditions in that?

dom_output <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -bbb_300x250 v2"   , "notest_Orange aaa_300x250 v2"    , "bottle :15s","aaaa vvvv cccc 320x480"),
  label  = c("Watermelon","Vanilla","Default","Default")
)
1

4 Answers 4

5
library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
#      Site                              Banner      label
# 1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
# 2    beta notest_Vanilla Latte-DPI_300x250 v2    Vanilla
# 3 charlie                         bottle :15s    Default
# 4   delta aaaa vvvv cccc Build_Mobile_320x480    Default
Sign up to request clarification or add additional context in comments.

Comments

0

Here's a simple solution using Base R:

#Sample data:
dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)


dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
                    ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))

Comments

0

One base R possibility could be:

labels <- paste(c("Watermelon", "Orange"), collapse = "|")

dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
dom$label[is.na(dom$label)] <- "Default"

     Site                              Banner      label
1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
2    beta  notest_Orange Latte-DPI_300x250 v2     Orange
3 charlie                         bottle :15s    Default
4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

The same could be used also by dplyr and tidyr:

dom %>%
 mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
        label = replace_na(label, "Default"))

Sample data:

dom <- data.frame(
 Site = c("alpha", "beta", "charlie", "delta"),
 Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

Comments

0
library(dplyr)
library(stringi)

dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
                                 stri_detect_fixed(Banner, "Vanilla")    ~ "Vanilla",
                                                                   TRUE  ~ "Default"))
#>      Site                              Banner          label
#> 1   alpha  testing_Watermelon -DPI_300x250 v2     Watermelon
#> 2    beta notest_Vanilla Latte-DPI_300x250 v2        Vanilla
#> 3 charlie                         bottle :15s        Default
#> 4   delta aaaa vvvv cccc Build_Mobile_320x480        Default

Data:

dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
                  Banner = c("testing_Watermelon -DPI_300x250 v2",
                             "notest_Vanilla Latte-DPI_300x250 v2",
                             "bottle :15s",
                             "aaaa vvvv cccc Build_Mobile_320x480"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.