2

I am trying to create an indicator variables for different races/ethnicities in my data. In my data ('mydata') I have a variable called "Race". This variable has the output for every box a person marked under race on a questionnaire. So it looks something like this:

ID   Race
6    American Indian or Alaska Native, Black or African American, Hispanic or Latino
7    Hispanic or Latino
10   Native Hawaiian or Other Pacific Islander
11   Hispanic or Latino, White
29   White
30   Black or African American
31   American Indian or Alaska Native, Hispanic or Latino, White

I want to create a variable so that if someone for example said that they were Hispanic, ignoring what else they said, under the new "Hispanic" variable they would get a "1" or if they did not say they were Hispanic then "Hispanic" would get a 0.

I know this entails using partial string matching but I am having difficulty getting the output that I would like. I have made multiple attempts with little luck. Here is the code for my last attempt:

if(mydata[grep("Hispanic", mydata$Race)]) {
  Hispanic<-1
 } else {
  Hispanic<-0
 }      
0

2 Answers 2

5

You may use grepl which "returns a logical vector (match or not for each element of x)" (from ?grepl). The resulting logical vector can then be converted to 0 (FALSE) or 1 (TRUE) by using as.integer:

mydata$Hispanic <- as.integer(grepl(pattern = "Hispanic", x = mydata$Race))
mydata$Hispanic
# [1] 1 1 0 1 0 0 1
Sign up to request clarification or add additional context in comments.

1 Comment

Well, Henrik, I am ashamed at how simple the code is. It is exactly what I needed. Thank you for taking the time to answer my question!
4

Another option:

re-create the data

mydata <- read.table(
  header = T, con <- textConnection
  ('
ID Race
6 "American Indian or Alaska Native, Black or African American, Hispanic or Latino"
7 "Hispanic or Latino"
   10 "Native Hawaiian or Other Pacific Islander"
   11 "Hispanic or Latino, White"
   29 "White"
   30 "Black or African American"
   31 "American Indian or Alaska Native, Hispanic or Latino, White"
   '), stringsAsFactors = FALSE, fill = TRUE, strip.white = TRUE)
close(con)

Use data.table

library(data.table)
setDT(mydata); setkey(mydata, Race)
mydata[grep("hispanic", Race, ignore.case=T), Race_x := 1]
mydata[is.na(Race_x), Race_x := 0][]

   ID                                                                            Race Race_x
1:  6 American Indian or Alaska Native, Black or African American, Hispanic or Latino      1
2: 31                     American Indian or Alaska Native, Hispanic or Latino, White      1
3: 30                                                       Black or African American      0
4:  7                                                              Hispanic or Latino      1
5: 11                                                       Hispanic or Latino, White      1
6: 10                                       Native Hawaiian or Other Pacific Islander      0
7: 29                                                                           White      0

1 Comment

Interesting, this would be useful from some of my other data cleaning as well. Thanks for the response.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.