0

I have a dataframe df. It contains factors for the most part, besides select numeric columns.

I want to create a data quality report and everything is being read in as integers. So I captured the following column indexes and wanted to convert those columns to type factor:

n_cols = c(1,3,4,9:17,28:35)

for (x in length(df)) {
  if (x %in% n_cols == FALSE) {
    df[,x] = as.factor(df[,x])
  }
}

The code is running, but it is not properly converted when I call str(df).

I come from a Python background, so some of this syntax is newer to me.

12
  • 3
    Your problem is in your for loop set up for (x in length(df)). This only checks the last column. You should do something like for (x in 1:length(df)) or for (x in seq_along(df)). Commented Aug 31, 2018 at 2:48
  • 4
    No need of any loops df[n_cols] <- lapply(df[n_cols], as.factor) Commented Aug 31, 2018 at 2:50
  • 2
    @RonakShah, since OP has if (x %in% n_cols == FALSE) shouldn't your example be the complement of n_cols? Something like : df[!(1:length(df) %in% n_cols)] Commented Aug 31, 2018 at 2:56
  • 2
    @JosephWood yes, most probably. But if that is the case I would suggest OP to have n_cols of columns to turn into factors instead which would make it simpler. Or they can use setdiff to get them. Also, I captured the following column indexes and wanted to convert those columns to type factor is confusing. Commented Aug 31, 2018 at 3:02
  • 2
    @JosephWood just do df[-n_cols] Commented Aug 31, 2018 at 3:54

2 Answers 2

1

To convert selected columns in a data frame to factors inside a for-loop I have created a reproducible example below using the mtcars dataset.

Note: This depends on specifying a vector of column numbers that you do want coerced to factors. If you want to invert this logic you can insert a ! inside the if() statement to negate the logic.

# example data
data(mtcars)

# columns to go to factors
to_fact <- c(1, 3, 5, 7)

for(x in seq_along(mtcars)) {
  if(x %in% to_fact){
    mtcars[,x] <- as.factor(mtcars[,x]) 
  }
}

str(mtcars)
#> 'data.frame':    32 obs. of  11 variables:
#>  $ mpg : Factor w/ 25 levels "10.4","13.3",..: 16 16 19 17 13 12 3 20 19 14 ...
#>  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp: Factor w/ 27 levels "71.1","75.7",..: 13 13 6 16 23 15 23 12 10 14 ...
#>  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
#>  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: Factor w/ 30 levels "14.5","14.6",..: 6 10 22 24 10 29 5 27 30 19 ...
#>  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Created on 2018-08-31 by the reprex package (v0.2.0).

In order to complete this more succinctly you can also use the purrr package for functional programming:

mtcars[to_fact] <- purrr::map_df(mtcars[to_fact], as.factor)
Sign up to request clarification or add additional context in comments.

2 Comments

for(x in seq_along(mtcars)) { ... if(x %in% to_fact){ is totally unnecessary, Just directly do for (factorCol in to_fact) { ... already!
Yeah I agree, just trying to mirror the OP syntax they were having trouble with. Questionable to even use a for-loop.
0

1) You can do it in a one-liner with sapply/lapply:

mtcars[,factorCols] <- lapply(mtcars[,factorCols], as.factor)

2) Longer alternative: no need for the nested for-if; you know the specific column-indices of the columns you want to convert. So directly iterate over them, already:

data(mtcars)
factorCols <- c(1,3,5,7)

for (factorCol in factorCols) {
  mtcars[, factorCol] <- as.factor(mtcars[, factorCol])
}

which is essentially a one-liner.

2 Comments

Why not just mtcars[factorCols] <- lapply(mtcars[factorCols], as.factor) and do away with the anonymous function? Not entirely sure, but I think it will also avoid needing to run [ length(factorCols) times instead of just once.
@thelatemail: thanks, that's simpler, I used your version. I thought it lumped all the column levels together as as.factor(mtcars[,factorCols]) does which gives garbage

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.