2

Starting from a dataframe in R like the following (df):

year_1 <- c('James','Mike','Jane', NA)
year_2 <- c('Evelyn', 'Jackson', 'James', 'Avery')
year_3 <- c('Harper', 'Avery', NA, NA)
df <- data.frame(year_1, year_2, year_3)

...I would like convert it into something like df1 (of course I have hundreds of elements in my original dataframe, so I can't go manually)

names <- c('James','Mike','Jane','Evelyn', 'Jackson', 'Avery', 'Harper')
year_1 <- c('YES','YES','YES', 'NO', 'NO', 'NO', 'NO')
year_2 <- c('YES','NO','NO', 'YES', 'YES', 'YES', 'NO')
year_3 <- c('NO','NO','NO', 'NO', 'NO', 'YES', 'YES')
df_1 <- data.frame(year_1, year_2, year_3)
rownames(df_1) <- names

I have tried to:

  1. convert all elements of df into a string vector with unique elements
  2. construct the structure of df1 taking the names of step 1)
  3. try to fill df1 with a loop (here is where I am not able to build a proper loop that makes the trick)

Any idea?

Thanks!!

3
  • 1
    You can do something like as.data.frame.matrix(table(stack(df))). Commented Dec 17, 2020 at 20:24
  • "Error in stack.data.frame(df) : no vector columns were selected" Commented Dec 17, 2020 at 20:26
  • Make sure your columns are characters, not factors. Commented Dec 17, 2020 at 20:30

4 Answers 4

3

A base R option using stack + table

> as.data.frame(ifelse(table(stack(df)) == 1, "YES", "NO"))
        year_1 year_2 year_3
Avery       NO    YES    YES
Evelyn      NO    YES     NO
Harper      NO     NO    YES
Jackson     NO    YES     NO
James      YES    YES     NO
Jane       YES     NO     NO
Mike       YES     NO     NO
Sign up to request clarification or add additional context in comments.

2 Comments

+1 but I'd probably use lapply(df, as.character) given their comment under their question, and [] replacement instead of ifelse. Something like: x <- table(stack(lapply(df, as.character))) + 1; x[] <- c("NO", "YES")[x]; x.
@A5C1D2H2I1M1N2O1R2T1 Yes, it makes sense, and we can use type.convert(df,as.is = TRUE) if possible
2

What about this?

sapply(df, function(x) sapply(na.omit(unique(unlist(df))), `%in%`, x))
#         year_1 year_2 year_3
# James     TRUE   TRUE  FALSE
# Mike      TRUE  FALSE  FALSE
# Jane      TRUE  FALSE  FALSE
# Evelyn   FALSE   TRUE  FALSE
# Jackson  FALSE   TRUE  FALSE
# Avery    FALSE   TRUE   TRUE
# Harper   FALSE  FALSE   TRUE

Comments

1

here is an option with tidyverse where we reshape the data into 'long' format pivot_longer, get the distinct rows, create a column of 'YES' and reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(tibble)
df %>%
  pivot_longer(cols = everything(), values_drop_na = TRUE) %>%
  distinct %>%
  mutate(new = 'YES') %>% 
  pivot_wider(names_from = name, values_from = new, values_fill = 'NO') %>%
  column_to_rownames("value")

-output

#          year_1 year_2 year_3
#James      YES    YES     NO
#Evelyn      NO    YES     NO
#Harper      NO     NO    YES
#Mike       YES     NO     NO
#Jackson     NO    YES     NO
#Avery       NO    YES    YES
#Jane       YES     NO     NO

1 Comment

Super answer. Very fast and clean!! Thank you so much
0

To offer another option, first we can extract the unique names from df using a nested for loop. We test if the name is already in our list, and further test if we're looking at an NA.

people<-c()
for (i in 1:length(colnames(df))){
  for (j in 1:length(df[,1])){
    pers<-df[j,i]
    if (!(pers %in% people)){
      if (!is.na(pers)){
        people<-c(people,toString(pers))
      }
    }
  }
}

From here, we can iterate a simple %in% check over each year and combine into a full dataframe. The above answers are probably more straightforward, but I've found code like this is useful if you need to make other small changes to the data as it passes through the script.

for (i in 1:length(colnames(df))){
  colname<-colnames(df)[i]
  peoplein<-people %in% df[,i]
  if (i == 1){
    df1<-cbind(people,peoplein)
    colnames(df1)[i+1]<-colname
  } else {
    df1<-cbind(df1,peoplein)
    colnames(df1)[i+1]<-colname
  }
}

The resulting df1 is shown below.

     people    year_1  year_2  year_3 
[1,] "James"   "TRUE"  "TRUE"  "FALSE"
[2,] "Mike"    "TRUE"  "FALSE" "FALSE"
[3,] "Jane"    "TRUE"  "FALSE" "FALSE"
[4,] "Evelyn"  "FALSE" "TRUE"  "FALSE"
[5,] "Jackson" "FALSE" "TRUE"  "FALSE"
[6,] "Avery"   "FALSE" "TRUE"  "TRUE" 
[7,] "Harper"  "FALSE" "FALSE" "TRUE" 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.