0

I'm trying to create a custom function that generates new binary variables in an existing dataframe. The idea is to be able to feed the function with the diagnosis description (string), ICD9 diagnosis code (number), and patient database. The function would then generate new variables for all the diagnosis of interest and assign a 0 or 1 if the patient (row or observation) has the diagnosis.

Below are the function variables:

x<-c("2851") #ICD9 for Anemia
y<-c("diag_1") #Primary diagnosis 
z<-"Anemia"  #Name of new binary variable for patient dataframe
i<-patient_db #patient dataframe

patient<-c("a","b","c")
diag_1<-c("8661", "2851","8651")
diag_2<-c("8651","8674","2866")
diag_3<-c("2430","3456","9089")

patient_db<-data_frame(patient,diag_1,diag_2,diag_3)

  patient  diag_1 diag_2 diag_3
1       a  8661   8651   2430
2       b  2851   8674   3456
3       c  8651   2866   9089

Below is the function:

diagnosis_func<-function(x,y,z,i){

pattern = paste("^(", paste0(x, collapse = "|"), ")", sep = "")

i$z<-ifelse(rowSums(sapply(i[y], grepl, pattern = pattern)) != 0,"1","0")

}

This is what I would like to get at after running the function:

  patient  diag_1 diag_2 diag_3  Anemia
1       a  8661   8651   2430      0
2       b  2851   8674   3456      1
3       c  8651   2866   9089      0

The lines within the function have been tested outside the function and are working. Where I'm stuck is trying to get the function working. Any help would be greatly appreciated.

Happy New Year

Albit

5
  • 1
    I think you miss a return value in your function. Simply adding a line return(i) in your function should solve the problem. Commented Jan 4, 2017 at 2:57
  • Thanks for your prompt reply Raymkchow. I just tried return(i), it populates the dataframe in the console but does not add the new variable. Commented Jan 4, 2017 at 3:04
  • Objects in R are immutable so you cannot pass i by reference. You have to assign the variable like patient_db <- diagnosis_func(x,y,z,i). Also, the fourth line of your code (i<-patient_db #patient dataframe) should be put after the declaration of patient_db to get the correct i. Commented Jan 4, 2017 at 3:13
  • Yes, the order was for explanatory purposes only, in the actual code only variables x and y are defined. "Anemia" and patient_db (already in global environment) are defined as function arguments. Commented Jan 4, 2017 at 3:27
  • This is how I'm passing the arguments to the function: 'diagnosis_func(x,y,"Anemia",patient_db)' Commented Jan 4, 2017 at 4:43

1 Answer 1

1

If you are intending to only work with one diagnosis at a time, this will work. I took the liberty of renaming arguments to be a little easier to work with in the code.

diagnosis_func <- function(data, target_col, icd, new_col){
  pattern <- sprintf("^(%s)", 
                     paste0(icd, collapse = "|"))

  data[[new_col]] <- grepl(pattern = pattern, 
                           x = data[[target_col]]) + 0L
  data
}

diagnosis_func(patient_db, "diag_1", "2851", "Anemia")

# Multiple codes for a single diagnosis
diagnosis_func(patient_db, "diag_1", c("8661", "8651"), "Dx")

If you want to spruce it up a little to prevent inadvertent mistakes, you can install the checkmate package and use this version. This will

diagnosis_func <- function(data, target_col, icd, new_col){

  coll <- checkmate::makeAssertCollection()

  checkmate::assert_class(x = data,
                          classes = "data.frame",
                          add = coll)

  checkmate::assert_character(x = target_col,
                              len = 1,
                              add = coll)

  checkmate::assert_character(x = icd,
                              add = coll)

  checkmate::assert_character(x = new_col,
                              len = 1,
                              add = coll)

  checkmate::reportAssertions(coll)

  pattern <- sprintf("^(%s)", 
                     paste0(icd, collapse = "|"))

  data[[new_col]] <- grepl(pattern = pattern, 
                           x = data[[target_col]]) + 0L
  data
}

diagnosis_func(patient_db, "diag_1", "2851", "Anemia")
Sign up to request clarification or add additional context in comments.

4 Comments

Benjamin -- Sorry, but was traveling. When I finally tried the codes neither worked. When you tried it, did the "patient_db" database increased the number of columns by one?
After looking at this again, I'm getting the exact output described in your question. Could you describe in more detail what "neither worked" looks like? Are you getting error messages? Are you saving your result to an object (patient_db <- diagnosis_func(patient_db, ...)`)?
I was not saving it as an object. It works now! Thank you Benjamin.
Benjamin--- I've been trying to use the same code to filter 2 or more target_cols columns Ex. diagnosis_func(patient_db, c("diag_1","diag_2"), "2851", "Anemia") ... But I'm getting the following error: Error in .subset2(x, i, exact = exact) : subscript out of bounds ---Do you know how I could get it to work? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.