1

I would like to make a new variable unsure which contains the word "unsure" if any of the following words are found in the freetext column: "too soon", "to tell", leaving the freetext unchanged, and NA in the new column when freetext doesn't contain those words. Currently the data looks like:

   id               freetext date
1   1           its too soon    1
2   2           I'm not sure    2
3   3                   pink   12
4   4                 yellow   15
5   5       too soon to tell   20
6   6 I think it is too soon    2
7   7                 5 days    6
8   8                    red    7
9   9        its been 2 days    3
10 10       too soon to tell   11

The data:

structure(list(id = c("1","2","3","4","5","6","7","8","9","10"), 
            freetext = c("its too soon", "I'm not sure",
"pink","yellow","too soon to tell","I think it is too soon","5 days","red",
"its been 2 days","too soon to tell","scans","went on holiday"), 
date = c("1","2","12","15","20","2","6","7","3","11")), class = "data.frame", row.names = c(NA,-10L))

And I would like it to look like:

    id               freetext unsure date
1   1           its too soon unsure    1
2   2           I'm not sure   <NA>    2
3   3                   pink   <NA>   12
4   4                 yellow   <NA>   15
5   5       too soon to tell unsure   20
6   6 I think it is too soon unsure    2
7   7                 5 days   <NA>    6
8   8                    red   <NA>    7
9   9        its been 2 days   <NA>    3
10 10       too soon to tell unsure   11
0

1 Answer 1

0

You can use if_else with str_detect for pattern matching -

library(tidyverse)
df %>% mutate(unsure = if_else(str_detect(freetext, 'too soon|to tell'), 'unsure', NA_character_))

#   id               freetext date unsure
#1   1           its too soon    1 unsure
#2   2           I'm not sure    2   <NA>
#3   3                   pink   12   <NA>
#4   4                 yellow   15   <NA>
#5   5       too soon to tell   20 unsure
#6   6 I think it is too soon    2 unsure
#7   7                 5 days    6   <NA>
#8   8                    red    7   <NA>
#9   9        its been 2 days    3   <NA>
#10 10       too soon to tell   11 unsure

In base R -

transform(df, unsure = ifelse(grepl('too soon|to tell', freetext), 'unsure', NA))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.