0

I have two data frames one is a map with over 20000 possibilities, another one contains 3 columns of 30000 rows of data. I need to use the map to figure out the correct name. Here is a simple example of what I need:

For instance,

data <- data.frame(
  V1 = c('baa','bb','aa','cc','dd','ee','caa'),
  V2 = c('ff','gg','hh','yy','jj','kk','hh')
)
# V1 V2
# baa ff
# bb gg
# aa hh
# cc yy
# dd jj
# ee kk
# caa hh

map <- data.frame(
  V1 = c('aa','gg','cc','jj','kk'), 
  V2  = c(1:5)
) 
# V1 V2 
# aa 1
# gg 2
# cc 3
# jj 4
# kk 5

>what.I.need
V1 V2 V3
baa ff 1
bb gg 2
aa hh 1
cc yy 3
dd jj 4
ee kk 5
caa hh 1

I tried using grep, but I can't seem to figure out how to make it work with a map of 20000 possibilities and have it populate the 3rd column in "what.I.need". Thank you in advance.

3
  • Could you at least share a sample of your data? Commented Jul 3, 2018 at 21:34
  • It is optimal for those helping if you present your example data in a format easy to input into R. As in df1 <- data.frame(data), shown in the link above. Commented Jul 3, 2018 at 21:41
  • I can't share the data, even a sample. The columns in "data" have some matching entries. I'll try to put something together or find something similar. Commented Jul 3, 2018 at 21:42

3 Answers 3

1
df1 <- read.table(text = "
V1 V2
aa ff
bb gg
aa hh
cc yy
dd jj
ee kk
aa hh", h = T, stringsAsFactors = F)

df2 <- read.table(text = "
V1 V3 
aa 1
gg 2
cc 3
jj 4
kk 5", h = T, stringsAsFactors = F)


library(tidyr)
library(dplyr)

df1 %>% 
  gather(V2, V1, V1, V2) %>% 
  full_join(df2) %>% 
  filter(!is.na(V3)) %>% 
  full_join(df1) -> df1

df1$V3 <- c(df1$V3[!is.na(df1$V3)])

df1 %>% 
  filter(!V2 %in% c("V1","V2")) %>% 
  select(V1,V2,V3)

  V1 V2 V3
1 aa ff  1
2 bb gg  1
3 aa hh  3
4 cc yy  1
5 dd jj  2
6 ee kk  4
7 aa hh  5

I have the feeling it could get more concise than this. :)

Sign up to request clarification or add additional context in comments.

1 Comment

Note the OP changed the sample data so they are not always exact matches any more.
0
library(dplyr)
library(tidyr)

df1 <- data.frame(V1 = c("aa", "bb", "aa", "cc", "dd", "ee", "aa"), V2 = c("ff", "gg", "hh", "yy", "jj", "kk", "hh"), stringsAsFactors = FALSE)
df2 <- data.frame(V1 = c("aa", "gg", "cc", "jj", "kk"), V2 = c(1,2,3,4,5), stringsAsFactors = FALSE)

left_join(df1, df2, by = c("V2" = "V1")) %>% 
left_join(., df2, by = "V1") %>% 
  mutate(V3 = ifelse(is.na(V2.y), V2.y.y, V2.y)) %>% 
  select(-V2.y, -V2.y.y)

This creates this table, then drops V2.y and V2.y.y.

  V1 V2.x V2.y V2.y.y V3
1 aa   ff   NA      1  1
2 bb   gg    2     NA  2
3 aa   hh   NA      1  1
4 cc   yy   NA      3  3
5 dd   jj    4     NA  4
6 ee   kk    5     NA  5
7 aa   hh   NA      1  1

Which gives you this:

  V1 V2.x V3
1 aa   ff  1
2 bb   gg  2
3 aa   hh  1
4 cc   yy  3
5 dd   jj  4
6 ee   kk  5
7 aa   hh  1

3 Comments

Note the OP changed the sample data so they are not always exact matches any more.
Thank you for your help! I couldn't figure out this route. I tried doing for loops, but it didn't work either. Eventually I found another way to deal with the data that didn't involve pattern matching.
@GulnazHoxmeier-Omasheva can you edit your question to include your solution, or post it as an answer? To work with your edits, I'd have to change my answer to maybe a fuzzy join/pattern matching.
0

You may try this:

data <- data.frame(
  V1 = c('aa','bb','aa','cc','dd','ee','aa'),
  V2 = c('ff','gg','hh','yy','jj','kk','hh'), stringsAsFactors = F
)

map <- data.frame(
  V1 = c('aa','gg','cc','jj','kk'), 
  V2  = c(1:5), stringsAsFactors = F
)

data$V3.1 <- map$V2[match(data$V1, map$V1)]
data$V3.2 <- map$V2[match(data$V2,map$V1)]
data$V3 <- ifelse(!is.na(data$V3.1), data$V3.1, data$V3.2)
data
# V1 V2 V3.1 V3.2 V3
# 1 aa ff    1   NA  1
# 2 bb gg   NA    2  2
# 3 aa hh    1   NA  1
# 4 cc yy    3   NA  3
# 5 dd jj   NA    4  4
# 6 ee kk   NA    5  5
# 7 aa hh    1   NA  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.