1

I am a beginner in R and wanted to know if there is any way to convert multiple vectors/variables into a desired 'class' (e.g. 3 variables within a dataset are factors, and I want to convert these 3 into numerical variables in one go).

Below is the dataset which contains columns "Product" as chr and the remaining columns as factors, however I want to keep "Product" and "Month" as character and "Sales" and "Profit" as numeric.

str(Conditional_function_IVY)

'data.frame':   100 obs. of  4 variables:
 $ Product: chr  "Bellen" "Bellen" "Sunshine" "Sunset" ...
 $ Month  : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 4 4 ...
 $ Sales  : Factor w/ 88 levels " ? 501.00 "," ? 504.00 ",..: 8 13 64 16 55 78 81 29 2 52 ...
 $ Profit : Factor w/ 65 levels " ? 100.00 "," ? 101.00 ",..: 44 34 5 15 39 16 37 38 65 56 ...

I've done it in the following way but it consumes a lot of time, hence I am wondering if there is any way which would let me do this in one go.

Conditional_function_IVY$Month=as.character(Conditional_function_IVY$Month)
> Conditional_function_IVY$Sales=as.numeric(Conditional_function_IVY$Sales)
> Conditional_function_IVY$Profit=as.numeric(Conditional_function_IVY$Profit)
> str(Conditional_function_IVY)
'data.frame':   100 obs. of  4 variables:
 $ Product: chr  "Bellen" "Bellen" "Sunshine" "Sunset" ...
 $ Month  : chr  "January" "January" "January" "January" ...
 $ Sales  : num  8 13 64 16 55 78 81 29 2 52 ...
 $ Profit : num  44 34 5 15 39 16 37 38 65 56 ...

2 Answers 2

1

The best way of fixing this is at the time of date frame creation/import, more modern approaches from the tidyverse such as readr and tibble deal well with guessing column types and don't convert automatically to factor.

If that is not an option for you then you can transform with dplyr::mutate quite simply.

library(magrittr)
library(dplyr)

Conditional_function_IVY %<>%
  mutate(
    Month = as.character(Month),
    Sales = as.numeric(as.character(Sales)),
    Profit = as.numeric(as.character(Profit))
  )

However, I notice you have some very strange values visible in your structure where your numeric values are stored. These could be stripped back to numeric using gsub.

e.g. as.numeric(gsub("[^0-9.]", "", " ? 501.00 ")) # [1] 501

With two rows of your data

Using the two rows of your own data that I can derive from your question.

Conditional_function_IVY <- data.frame(
  Product = rep("Bellen", 2),
  Month = c("April", "August"),
  Sales = c(" ? 501.00 ", " ? 504.00 "),
  Profit = c(" ? 100.00 ", " ? 101.00 ")
)

Conditional_function_IVY %>%
  mutate(
    Month = as.character(Month),
    Sales = as.numeric(gsub("[^0-9.]", "", as.character(Sales))),
    Profit = as.numeric(gsub("[^0-9.]", "", as.character(Profit)))
  )

#   Product  Month Sales Profit
# 1  Bellen  April   501    100
# 2  Bellen August   504    101 
Sign up to request clarification or add additional context in comments.

1 Comment

Excellent Kevin - Much appreciated
1

I like Kevin's approach, except that I dislike the copy/paste/editing of as.numeric(gsub("[^0-9.]", "", as.character(...)). If you had even 10 columns this would be tedious, if you had 100 columns it would be utterly impractical. I would define a little utility functon and do something like this:

# define helper function
sub_convert = function(x) as.numeric(gsub("[^0-9.]", "", as.character(...))

# using base R
to_convert = names(Conditional_function_IVY)[sapply(Conditional_function_IVY, is.factor)]
Conditional_function_IVY[to_convert] = lapply(
    Conditional_function_IVY[to_convert],
    sub_convert
)

# or using dplyr
library(dplyr)
Conditional_function_IVY = mutate_if(
    Conditional_function_IVY,
    is.factor,
    sub_convert
)

This scales better and also has the advantage that if you need to tweak the sub_convert function you only need to edit it in one place, instead of every time it is used.

1 Comment

I like your function to generalise the text cleaning and agree my approach may be tedious with many columns. However, I also tend to follow Hadley's rule of thumb to write a function after more than a few (3) repeats.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.