0

I am trying to create a new column based on a existing column that uses pattern matching. The existing column is a user agent field such as

"Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B367 Safari/531.21.10"

I want to create a new column that uses pattern matching to identify what device is.

-So if user_agent like '%iPad%' and user_agent like '%WebKit%' then device is iPad. -if user agent user_agent like '%Android%' and user_agent not like '%Mobile%' then device is an android - if the (user_agent like '%Silk%' and user_agent like '%WebKit%') then device is kindle -if (user_agent like '%Playbook%') then device is Other

I want to try using the mutate function in dplyr to create the new column but need help with how to structure the regular expression

i.e mutate(data,device = ....)

6
  • 1
    As you present your data this is not a column but a character vector with one element. I am lost with this unclear explanation. Commented Apr 10, 2015 at 16:31
  • The user agent field is a column with rows that represent different user agents. So for each row I want to create a new column that identifies if the user agent field as a device Commented Apr 10, 2015 at 16:36
  • But this is not totally what you wrote ... you just put a random string, not affected to any data.frame ... Commented Apr 10, 2015 at 16:41
  • Oh I didn't see that. It looks like my data format was transformed into lines of code. What I meant to say was the user agent is a column in a dataframe Commented Apr 10, 2015 at 16:42
  • ..and ....can you....tada....reformat your data :) ? Commented Apr 10, 2015 at 16:43

1 Answer 1

2

Something like this?

x <- c("Mozilla/5.0 (iPad; stuff AppleWebKit more stuff",
        "Android",
        "stuff Silk more stuff and WebKit",
        "stuff Playbook more stuff", 
        "unknown")

y <- ifelse(grepl("iPad", x) & grepl("WebKit", x), "iPad", 
        ifelse(grepl("Android", x) & !grepl("Mobile", x), "android", 
                ifelse(grepl("Silk", x) & grepl("WebKit", x), "kindle", 
                        ifelse(grepl("Playbook", x), "other", 
                                "don't know")
                )
        )
)

data.frame(x, y)
                                                x          y
1 Mozilla/5.0 (iPad; stuff AppleWebKit more stuff       iPad
2                                         Android    android
3                stuff Silk more stuff and WebKit     kindle
4                       stuff Playbook more stuff      other
5                                         unknown don't know

EDIT

Or perhaps this is easier:

device <- rep(NA_character_, length(x))

device[grepl("iPad", x) & grepl("WebKit", x)] <-  "iPad"
device[grepl("Android", x) & !grepl("Mobile", x)] <-  "android"
device[grepl("Silk", x) & grepl("WebKit", x)] <-  "kindle"
device[grepl("Playbook", x)] <-  "other"

data.frame(x, device)

                                                x  device
1 Mozilla/5.0 (iPad; stuff AppleWebKit more stuff    iPad
2                                         Android android
3                stuff Silk more stuff and WebKit  kindle
4                       stuff Playbook more stuff   other
5                                         unknown    <NA>
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the help Jeff. I'm new to grepl but it seems to combine multiple conditions unlike grep. Do you folks know how I can post sample data sets on stack overflow. everytime I try to do so they come out as one line items and not a dataframe
grepl returns a logical vector so it can be useful for what you are trying to do (I think). Usually best to get a small sample of your data and simply copy/paste the output of dput(sampleData)
How can this be modified to search for "Kit", instead of "WebKit", and then return "kindle"?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.