Error in opening a CSV file with specifying row.name using R

Question

I have a big df (CSV format) that looks like:

miRNAs <- c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p')
cca <- c('12854','5489','54485','2563')
ccb <- c('124','589','5465','25893')
taa <- c('12854','589','5645','763')
df <- data.frame(miRNAs,cca,ccb,taa)

and I want to use this df in DESeq2 analyses. I made this df unique by using unique(df) and tried to open using countData <- as.matrix(read.csv(file="df.csv", row.name="miRNAs", sep = ",")) but it gives this error

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

Since I made the df unique I don't know why this error keeps popping up. Basically why I want to read my df in that way is that I want to get the list of my column headers (except the first column) when I type colnames(df). Because I need to do FALSE TRUE test to see if match these are matching with row names of another file called phenotype.csv all(rownames(phenotype) == colnames(countData))

if you have to have duplicated entries in the first column of df, it will give you this kind of error. Can you check table(duplicated(df$miRNAs)) — StupidWolf
– StupidWolf, Commented Feb 19, 2020 at 13:07
Yeah, so in your df, there are actually duplicated miRNAs (based on column miRNA) and they have different counts, which makes them non-duplicated. — StupidWolf
– StupidWolf, Commented Feb 19, 2020 at 13:16
I used this to remove the first column duplicates new_df <- df[!duplicated(df$miRNAs),,drop=FALSE] is this correct? — Apex
– Apex, Commented Feb 19, 2020 at 13:17
yes, this is ok, then write.csv(new_df,"new_df.csv",row.names=FALSE) — StupidWolf
– StupidWolf, Commented Feb 19, 2020 at 13:17

shs · Accepted Answer · 2020-02-19 12:20:46Z

1

In the row.name="miRNAs" argument you are not accessing the respective column, but are using a length one character vector. That then gets recycled and that's why you get the error. Import without the row.names argument and if you really want that variable as row names instead of a column, then do that after the import:

df <- data.frame(
  miRNAs = c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p'),
  cca = c('12854','5489','54485','2563'),
  ccb = c('124','589','5465','25893'),
  taa = c('12854','589','5645','763')
  )

rownames(df) <- df$miRNAs
df$miRNAs <- NULL
df
#>                cca   ccb   taa
#> mmu_mir-1-3p 12854   124 12854
#> mmu_mir-1-5p  5489   589   589
#> mmu-mir-6-5p 54485  5465  5645
#> mmu-mir-6-3p  2563 25893   763

^{Created on 2020-02-19 by the reprex package (v0.3.0)}

edited Feb 19, 2020 at 12:20

answered Feb 19, 2020 at 12:03

shs

3,9211 gold badge9 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Apex Over a year ago

I don't want miRNAs to be in my list when I type colnames(countData) I need to have just cca ccb and taa

Apex Over a year ago

I get error again it seems that this is not compatible with my original csv file. Lets say in this way that, I want to have the csv file opened in a way that when I type colnames(countData) I don't want to get the header of the first column which is miRNAs in the list.

Apex Over a year ago

I tried to used this code names(countData)[1] <- "" but when I type colnames(countData), "" appears in the list as of the header for the first column

Apex Over a year ago

there is no error message, just "" comes in my list when I type colnames(countData)

shs Over a year ago

I get that's happening when you just use an empty sting. But in the comment before, you were saying that you got an error message when you tried to use the solution I proposed

|

Collectives™ on Stack Overflow

Error in opening a CSV file with specifying row.name using R

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related