1

Let's say I have a dataframe:

word <- c("good", "great", "bad", "poor", "eh")
userid <- c(1, 2, 3, 4, 5)
d <- data.frame(userid, word)

I want to add a dataframe column, sentiment, that is a factor and depends on what word is:

words_pos <- c("good", "great")
words_neg <- c("bad", "poor")
calculate_sentiment <- function(x) {
     if (x %in% words_pos) {
         return("pos")
     } else if (x %in% words_neg) {
         return("neg")
     }
     return(NA)
}
d$sentiment <- apply(d, 1, function(x) calculate_sentiment(x['word'])

However, now d$sentiment is of type "character". How do I make it a factor with the right levels? pos, neg, NA -- I'm not even sure if NA should be a factor level, as I'm just learning R.

Thanks!

2
  • 3
    Try: d$sentiment<-factor(d$sentiment) Commented Jul 26, 2016 at 2:21
  • Don't you apply if only need it for a single column. This is both dangerous (because matrix conversions) and very inefficient. And I think you are looking for addNA instead of factor. Something like addNA(sapply(word, calculate_sentiment)). Not to mention that you probably could easily vectorize this too. Commented Jul 26, 2016 at 5:49

2 Answers 2

4

This isn't going to be the simplest way to do it, but it's a very readable way (in my opinion, preferable to using an abstracted function)... using dplyr's mutate along with case_when:

library(dplyr)
d2 <- mutate(d, sentiment = factor(case_when(word %in% words_pos ~ "pos",
                                             word %in% words_neg ~ "neg",
                                             TRUE                ~ NA_character_)))

glimpse(d2)
#> Observations: 5
#> Variables: 3
#> $ userid    <dbl> 1, 2, 3, 4, 5
#> $ word      <fctr> good, great, bad, poor, eh
#> $ sentiment <fctr> pos, pos, neg, neg, NA

I've spaced it out a bit so it's clearer, but this will:

  • take the data.frame d then
  • mutate (change a column) 'sentiment' to be equal to a factor, defined by
  • a case statement with logicals on the LHS, results on the RHS (NA_character_ required so that everything is the same type).

Output confirms that this is a factor column with the desired values.

Sign up to request clarification or add additional context in comments.

Comments

1

You can add as.factor to the last line of the code. Which will give factors of pos and neg. BTW NA is not a factor.

d$sentiment <-as.factor(apply(d, 1, function(x) calculate_sentiment(x['word'])))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.