2

I have multiple .csv files in my directory which don't have a column name. So while reading them without header gives error

Error in match.names(clabs, names(xi)) : names do not match previous names.

So for that reason, I want to append column names to those csv files and combine them all to one single dataframe, but I'm not able to add a column name to those multiple csv file while reading them. File names are like test_abc.csv , test_pqr.csv, test_xyz.csv etc. here is what I tried

temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
  ret <- read.csv(filename,header = F)
  ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename) 
  ret
}

df_all <- do.call(rbind,lapply(temp,read_csv_filename))

How do I add header here to every file while reading?

This is a names that I want to add while reading

colnames = c("Age","Gender","height","weight")

Any suggestion?

1
  • 3
    Maybe read.csv(..., col.names = c("Age","Gender","height","weight"))? Or do I get your question wrong? Commented Nov 17, 2017 at 6:07

2 Answers 2

2

Using tidyverse packages, you can do this nicely with purrr::map_dfr function, which iterates of a list, performing some function on each elements that returns a dataframe each time, and the row-binds all those data frames together.


library(readr)
library(purrr)
library(dplyr) # only used in example set up

# Setting up some example csv files to work with

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "mtcars_1.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 5:10), "mtcars_2.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 11:1), "mtcars_3.csv", col_names = FALSE)


# get file paths, read them all, and row-bind them all

dir(pattern = "mtcars_\\d+\\.csv") %>% 
  map_dfr(read_csv, col_names = c("mpg", "cyl", "disp"))

#> Parsed with column specification:
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_integer()
#> )

#> # A tibble: 21 x 3
#>      mpg   cyl  disp
#>    <dbl> <int> <dbl>
#>  1  21.0     6 160.0
#>  2  21.0     6 160.0
#>  3  22.8     4 108.0
#>  4  21.4     6 258.0
#>  5  18.7     8 360.0
#>  6  18.1     6 225.0
#>  7  14.3     8 360.0
#>  8  24.4     4 146.7
#>  9  22.8     4 140.8
#> 10  19.2     6 167.6
#> # ... with 11 more rows
Sign up to request clarification or add additional context in comments.

1 Comment

I really like the tidyverse and use it myself, too. But why loading 3 packages just for this task, if he obviously does not use any of them?
0

You can put colnames inside the loop itself like this

temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
  ret <- read.csv(filename,header = F)
  ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename) 
  colnames(ret) <- c("Age","Gender","height","weight","city")

  ret
}

df_all <- do.call(rbind,lapply(temp,read_csv_filename))

1 Comment

read.csv already has this option if you set col.names = c("Age","Gender","height","weight"), hence this additional line is not needed. And, if really want to use colnames(), you should at least do it before you add another column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.