3

I am pretty new at using R and I have some data that I need to tidy a bit before I can use it. Basically I have a dataframe with a bunch of rows and columns and in every cell of this dataframe I have a string of 20 numbers of 1 and zeroes ("0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0").

Now I am trying to separate every number of a field having each number in a new column (1 field would be 20 columns). After that I would like to convert these newly separated strings into numbers. I will show a small sample of the data. Here I would need the numbers separated in 40 columns and 3 rows:

df<-data.frame(
"V1" = c("0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ","0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ","1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "),
"V2" = c("0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ","0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ","0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "))

As you can see a good way to separate each number of a string would be treating the space as a delimiter, but I am not having any luck with that. I tried my luck with df<-lapply(strsplit(df, " "), as.numeric) but the dataframe can't be treated with this function. I tried then df<-lapply(strsplit(as.character(df), " "), as.numeric) That way it separates correctly but making the full dataframe as a character messes up the data.

I suppose that it's easier than I think but I still lack skill in this code.

1
  • Trry with read.fwf or read.table(text = as.character(df$V1), header = FALSE) Commented Dec 9, 2019 at 23:37

3 Answers 3

2

Easier option is read.table (no packages used)

read.table(text = as.character(df$V1), header = FALSE)

For multiple columns, use lapply

lapply(df, function(x) read.table(text = as.character(x), header = FALSE))
Sign up to request clarification or add additional context in comments.

2 Comments

I tried this and it changes the dataframe "df" to 2 lists instead of a full dataframe with all the data ordered. Trying to put this together in a simple way I have written df<-as.data.frame(df) and I think it is fine. This way I should give me a dataframe with every number shown as integer in each cell, right? Thank you very much in advance, it worked pretty well!
@Marc It would be do.call(cbind, lapply(df, function(x) read.table(text = as.character(x), header = FALSE)))
1

You can use cSplit from splitstackshape to convert multiple columns into separate columns.

splitstackshape::cSplit(df, names(df), " ")

#   V1_01 V1_02 V1_03 V1_04 V1_05 V1_06 V1_07 V1_08 V1_09 V1_10 V1_11
#1:     0     0     0     0     0     0     0     0     0     0     0
#2:     0     0     0     1     0     0     0     0     0     0     0
#3:     1     0     0     0     0     0     0     0     0     0     0

#   V1_12 V1_13 V1_14 V1_15 V1_16 V1_17 V1_18 V1_19 V1_20 V2_01 V2_02
#1:     0     0     0     1     0     0     0     0     0     0     0
#2:     0     0     0     0     0     0     0     0     0     0     0
#3:     0     0     0     0     0     0     0     0     0     0     0

#   V2_03 V2_04 V2_05 V2_06 V2_07 V2_08 V2_09 V2_10 V2_11 V2_12 V2_13
#1:     0     0     0     0     1     0     0     0     0     0     0
#2:     0     0     0     0     0     0     0     0     0     0     0
#3:     0     0     0     0     0     0     0     1     0     0     0

#   V2_14 V2_15 V2_16 V2_17 V2_18 V2_19 V2_20
#1:     0     0     0     0     0     0     0
#2:     0     0     0     0     0     1     0
#3:     0     0     0     0     0     0     0

Note that I have used names(df) here since you want to convert all the columns into separate columns. If you have additional columns and want to separate only few of them, you can also do

splitstackshape::cSplit(df, c("V1", "V2"), " ")

2 Comments

I tried your code and it seems that works like a charm, but now has a class of "data.table data.frame" even if I check the class of a single cell. If I want to use this as an integer it would be fine to leave this way or I should change its class. I tried making an ``df<-`lapply(df,as.numeric)``` and later putting it together again as a dataframe with ```df<-as.dataframe(df). This now gives me a dataframe with numeric cells, this would be correct or there is a better way?
@Marc It is okay to keep as data.table as well but if you want to have it as data.frame, df<-as.dataframe(df) should be enough.
0

I found both answers equally good but the use of cSplit made the posterior process better I think. What I finally did to obtain the result:

df<-cSplit(df, names(df), " ")
df<-lapply(df,as.numeric)
df<-as.data.frame(df)

I suppose that this can be done with less lines of code but this way is more understandable for me. Thank you very much for your answers!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.