2

I have a list of 59 data frames that I want to merge together. Unfortunately, because I have scraped many of them, the columns in the data frames have different classes. They all have the column "Name", some in factor form and some in character form. I want to change all of them to character form. I tried the following

dts <- c("Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","Florida",
               "Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine",
               "Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska",
               "Nevada","New_Hampshire","New_Jersey","New_Mexico","New_York","North_Carolina","North_Dakota",
               "Ohio","Oklahoma","Oregon","Pennsylvania","Rhode_Island","South_Carolina","South_Dakota","Tennessee",
               "Texas","Utah","Vermont","Virginia","Washington","West_Virginia","Wisconsin","Wyoming","Federal",
               "CCJail","DC","LAJail","NOLA","NYCJail","OCJail","PhilJail","TXJail")


for(i in 1:length(dts)){
        dts[i]$Name <- as.character(dts[i]$Name)
}

but it only gave me the error "Error: $ operator is invalid for atomic vectors". Does anyone know of a good work-around? Thanks in advance for the help!

My ultimate goal is to run

dta <-dplyr::bind_rows(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,
       Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,
       Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,
       Nevada,New_Hampshire,New_Jersey,New_Mexico,New_York,North_Carolina,North_Dakota,
       Ohio,Oklahoma,Oregon,Pennsylvania,Rhode_Island,South_Carolina,South_Dakota,Tennessee,
       Texas,Utah,Vermont,Virginia,Washington,West_Virginia,Wisconsin,Wyoming,Federal,CCJail,
       DC,LAJail,NOLA,NYCJail,OCJail,PhilJail,TXJail)

But I get the error "Error: Can't combine ..1$Residents.Confirmed and ..2$Residents.Confirmed ." There are a ton of columns in each data frame, and they are different classes very often. if anyone has a more elegant solution, I would also be open to that instead! Thanks!

3
  • Do you need to change only the 'Name' column class or all the columns? Commented Jun 27, 2020 at 4:37
  • @arkun I ned to change all the columns (some to character, some to numeric). Thanks! Commented Jun 27, 2020 at 4:39
  • That is not clear. Because it may also vary in each of the datasets. I have added a solution to change all the character and then bind them Commented Jun 27, 2020 at 4:40

3 Answers 3

2

We can get the datasets loaded into a list with mget (assuming the dataset objects are already created in the global environment) and then loop over the list with map, change the class of 'Name' column in mutate and row bind with suffix _dfr in map

library(dplyr)
library(purrr)
out <- map_dfr(mget(dts), ~ .x %>% 
                  mutate(Name = as.character(Name)))

If there are many columns that are different class. May be, it is better to convert to a single class for all the columns and then bind

out <- map_dfr(mget(dts), ~ .x %>%
                   mutate(across(everything(), as.character)))
out <- type.convert(out, as.is = TRUE)

If the dplyr version is < 1.0.0, use mutate_all

out <- map_dfr(mget(dts), ~ .x %>%
               mutate_all(as.character))
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the help! Right now, when I run code you suggested, it gives the following error: Error: Problem with mutate() input ..1. x Can't recycle ..1 (size 507) to match ..20 (size 22). ℹ Input ..1 is across(everything(), as.character).
and when I run rlang::last_error() I get <error/dplyr_error> Problem with mutate() input ..1. x Can't recycle ..1 (size 507) to match ..20 (size 22). ℹ Input ..1 is across(everything(), as.character). Backtrace: 1. purrr::map_dfr(...) 2. purrr::map(.x, .f, ...) 3. global::.f(.x[[i]], ...) 12. dplyr::mutate(., across(everything(), as.character)) 14. dplyr:::mutate_cols(.data, ...)
@babybonobo what is the version of dplyr I used 1.0.0
@babybonobo can you try with mutate_all as in the update
I'm using 1.0.0. Thanks for the help! I tried the mutate_all option but now it's giving me the error of Error: Invalid index: out of bounds
|
1
d1 <- data.frame(
  Name = as.factor(c("name1", "name2")),
  Residents.Confirmed = c(0,1)
  )
d2 <- data.frame(
  Name = c("name3", "name4"),
  Residents.Confirmed = c(2,3)
)
dataframes_list <- list(d1, d2)
for(i in 1:length(dataframes_list)){
  dataframes_list[[i]]$Name <- as.character(dataframes_list[[i]]$Name)
}
bind_rows(dataframes_list)

Comments

-1

Base R solution:

type.convert(do.call("rbind", 
        Map(function(x){data.frame(lapply(x, as.character))}, dataframes_list)))

Data thanks @chase171:

d1 <- data.frame(
  Name = as.factor(c("name1", "name2")),
  Residents.Confirmed = c(0,1)
)
d2 <- data.frame(
  Name = c("name3", "name4"),
  Residents.Confirmed = c(2,3)
)
dataframes_list <- list(d1, d2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.