2

I have a dataframe with 2 columns GL and GLDESC and want to add a 3rd column called KIND based on some data that is inside of column GLDESC.

DF:

      GL                             GLDESC
1 515100                        Payroll-ISL
2 515900                        Payroll-ICA
3 532300                           Bulk Gas
4 551000                          Supply AB
5 551000                        Supply XPTO
6 551100                          Supply AB
7 551300                             Intern

For each row of the data table:

  • If GLDESC contains the word Payroll anywhere in the string then I want KIND to be Payroll.

  • If GLDESC contains the word Supply anywhere in the string then I want KIND to be Supply.

  • In all other cases I want KIND to be Other.

Then, I found this:

DF$KIND <- ifelse(grepl("supply", DF$GLDESC, ignore.case = T), "Supply", 
         ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))

But with that, I have everything that matches Supply, for example, classified. However, as in DF lines 4 and 5, the same GL has two Supply, which for me is unnecessary. In fact, I need only one type of GLDESC to be matched if for the same GL the string is repeated.

Edit: I can not delet any row. I want to have this as output:

GL  GLDESC   KIND

A   Supply1  Supply
A   Supply2  N/A
A   Supply3  N/A
A   Supply4  N/A
A   Supply5  N/A
A   Supply6  N/A
A   Payroll1 Payroll
B   Supply2  Supply
B   Payroll  Payroll
5
  • 1
    Can you show the expected output Commented Sep 25, 2019 at 18:37
  • 1
    What do you mean by you only need one? What should happen to the other one? The row is removed from the data frame? Commented Sep 25, 2019 at 18:38
  • 1
    @Akrun just did it! Commented Sep 25, 2019 at 19:45
  • @IceCreamToucan I put a sample of output. I think I can show better what I want. Commented Sep 25, 2019 at 19:45
  • 1
    @akrun AMAZING! Thanks a lot! For real, you saved me! Commented Sep 25, 2019 at 20:22

1 Answer 1

1

If we need the repeating element to be NA, use duplicated on 'GLDESC' to get a logical vector and assign those elements in 'KIND' created with ifelse to NA

DF$KIND[duplicated(DF$GLDESC)] <- NA_character_

If we need to change the values by a grouping variable

library(dplyr)
DF  %>%
    group_by(GL) %>%
    mutate(KIND = replace(KIND, duplicated(KIND) & KIND == "Supply", NA_character_))

# A tibble: 9 x 3
# Groups:   GL [2]
#  GL    GLDESC   KIND   
#  <chr> <chr>    <chr>  
#1 A     Supply1  Supply 
#2 A     Supply2  <NA>   
#3 A     Supply3  <NA>   
#4 A     Supply4  <NA>   
#5 A     Supply5  <NA>   
#6 A     Supply6  <NA>   
#7 A     Payroll1 Payroll
#8 B     Supply2  Supply 
#9 B     Payroll  Payroll

Or with the full changes

 DF1 %>%
    mutate(KIND = str_remove(GLDESC, "\\d+"), 
    KIND = replace(KIND, !KIND %in% c("Supply", "Payroll"), "Othere")) %>% 
    group_by(GL) %>% 
    mutate(KIND = replace(KIND, duplicated(KIND) & KIND == "Supply", NA_character_))

data

DF1 <- structure(list(GL = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B"), GLDESC = c("Supply1", "Supply2", "Supply3", "Supply4", 
"Supply5", "Supply6", "Payroll1", "Supply2", "Payroll")), row.names = c(NA, 
-9L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.