2

I have a big df (CSV format) that looks like:

miRNAs <- c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p')
cca <- c('12854','5489','54485','2563')
ccb <- c('124','589','5465','25893')
taa <- c('12854','589','5645','763')
df <- data.frame(miRNAs,cca,ccb,taa)

and I want to use this df in DESeq2 analyses. I made this df unique by using unique(df) and tried to open using countData <- as.matrix(read.csv(file="df.csv", row.name="miRNAs", sep = ",")) but it gives this error

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

Since I made the df unique I don't know why this error keeps popping up. Basically why I want to read my df in that way is that I want to get the list of my column headers (except the first column) when I type colnames(df). Because I need to do FALSE TRUE test to see if match these are matching with row names of another file called phenotype.csv all(rownames(phenotype) == colnames(countData))

7
  • if you have to have duplicated entries in the first column of df, it will give you this kind of error. Can you check table(duplicated(df$miRNAs)) Commented Feb 19, 2020 at 13:07
  • I have 1978 FALSE and 44 TRUE Commented Feb 19, 2020 at 13:12
  • Yeah, so in your df, there are actually duplicated miRNAs (based on column miRNA) and they have different counts, which makes them non-duplicated. Commented Feb 19, 2020 at 13:16
  • I used this to remove the first column duplicates new_df <- df[!duplicated(df$miRNAs),,drop=FALSE] is this correct? Commented Feb 19, 2020 at 13:17
  • yes, this is ok, then write.csv(new_df,"new_df.csv",row.names=FALSE) Commented Feb 19, 2020 at 13:17

1 Answer 1

1

In the row.name="miRNAs" argument you are not accessing the respective column, but are using a length one character vector. That then gets recycled and that's why you get the error. Import without the row.names argument and if you really want that variable as row names instead of a column, then do that after the import:

df <- data.frame(
  miRNAs = c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p'),
  cca = c('12854','5489','54485','2563'),
  ccb = c('124','589','5465','25893'),
  taa = c('12854','589','5645','763')
  )

rownames(df) <- df$miRNAs
df$miRNAs <- NULL
df
#>                cca   ccb   taa
#> mmu_mir-1-3p 12854   124 12854
#> mmu_mir-1-5p  5489   589   589
#> mmu-mir-6-5p 54485  5465  5645
#> mmu-mir-6-3p  2563 25893   763

Created on 2020-02-19 by the reprex package (v0.3.0)

Sign up to request clarification or add additional context in comments.

7 Comments

I don't want miRNAs to be in my list when I type colnames(countData) I need to have just cca ccb and taa
I get error again it seems that this is not compatible with my original csv file. Lets say in this way that, I want to have the csv file opened in a way that when I type colnames(countData) I don't want to get the header of the first column which is miRNAs in the list.
I tried to used this code names(countData)[1] <- "" but when I type colnames(countData), "" appears in the list as of the header for the first column
there is no error message, just "" comes in my list when I type colnames(countData)
I get that's happening when you just use an empty sting. But in the comment before, you were saying that you got an error message when you tried to use the solution I proposed
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.