Adding column names to dataframe while reading a csv in r

Question

I have multiple .csv files in my directory which don't have a column name. So while reading them without header gives error

Error in match.names(clabs, names(xi)) : names do not match previous names.

So for that reason, I want to append column names to those csv files and combine them all to one single dataframe, but I'm not able to add a column name to those multiple csv file while reading them. File names are like test_abc.csv , test_pqr.csv, test_xyz.csv etc. here is what I tried

temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
  ret <- read.csv(filename,header = F)
  ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename) 
  ret
}

df_all <- do.call(rbind,lapply(temp,read_csv_filename))

How do I add header here to every file while reading?

This is a names that I want to add while reading

colnames = c("Age","Gender","height","weight")

Any suggestion?

Maybe read.csv(..., col.names = c("Age","Gender","height","weight"))? Or do I get your question wrong? — Tino
– Tino, Commented Nov 17, 2017 at 6:07

austensen · Accepted Answer · 2017-11-17 06:23:08Z

2

Using tidyverse packages, you can do this nicely with purrr::map_dfr function, which iterates of a list, performing some function on each elements that returns a dataframe each time, and the row-binds all those data frames together.

library(readr)
library(purrr)
library(dplyr) # only used in example set up

# Setting up some example csv files to work with

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "mtcars_1.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 5:10), "mtcars_2.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 11:1), "mtcars_3.csv", col_names = FALSE)


# get file paths, read them all, and row-bind them all

dir(pattern = "mtcars_\\d+\\.csv") %>% 
  map_dfr(read_csv, col_names = c("mpg", "cyl", "disp"))

#> Parsed with column specification:
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_integer()
#> )

#> # A tibble: 21 x 3
#>      mpg   cyl  disp
#>    <dbl> <int> <dbl>
#>  1  21.0     6 160.0
#>  2  21.0     6 160.0
#>  3  22.8     4 108.0
#>  4  21.4     6 258.0
#>  5  18.7     8 360.0
#>  6  18.1     6 225.0
#>  7  14.3     8 360.0
#>  8  24.4     4 146.7
#>  9  22.8     4 140.8
#> 10  19.2     6 167.6
#> # ... with 11 more rows

edited Nov 17, 2017 at 6:23

answered Nov 17, 2017 at 6:03

austensen

3,04716 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tino Over a year ago

I really like the tidyverse and use it myself, too. But why loading 3 packages just for this task, if he obviously does not use any of them?

Hardik Gupta · Accepted Answer · 2017-11-17 07:23:34Z

0

You can put colnames inside the loop itself like this

temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
  ret <- read.csv(filename,header = F)
  ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename) 
  colnames(ret) <- c("Age","Gender","height","weight","city")

  ret
}

df_all <- do.call(rbind,lapply(temp,read_csv_filename))

answered Nov 17, 2017 at 7:23

Hardik Gupta

4,82011 gold badges46 silver badges83 bronze badges

1 Comment

Tino Over a year ago

read.csv already has this option if you set col.names = c("Age","Gender","height","weight"), hence this additional line is not needed. And, if really want to use colnames(), you should at least do it before you add another column.

Collectives™ on Stack Overflow

Adding column names to dataframe while reading a csv in r

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related