Create variable based on value in multiple columns?

Question

There is a rather large Stata dataset (education) with 60+ variables devoted to 'exam taken' information and a few others based on student gender, age, demographics, etc. There are tens of thousands of students (rows). Unfortunately the grades on various tests are not standard (combo of letters and numbers, and may appear in any of the 60+ columns for each student, depending on when they took the relevant exam). I'm trying to create a new variable, identifying all those who took some variation of the G40 or G41 exam at this time. The grade columns are all assigned as dx with a number, so I've started by trying the following:

    gen byte event = 0 
    replace event = 1 if dx1 == "G40" | dx1 == "G41"| dx2 == "G40" | dx2 == "G41" | dx3 == "G40" | dx3 == "G41" | dx4 == "G40" | dx4 == "G41" | dx5 == "G40" | dx5 == "G41" & age < 12

I don't want to write out every single one of the 60+ columns each time I'm making a new variable for a new exam. Is there a faster way of doing this?

Nick Cox · Accepted Answer · 2020-06-23 06:43:12Z

1

I am going to show two techniques, as one is good for the smaller code example you give and one is better for 60+ "columns" (variables!).

Just your example I would tend to write as one line

gen byte event = (  inlist("G40", dx1, dx2, dx3, dx4, dx5) |  /// 
inlist("G41", dx1, dx2, dx3, dx4, dx5) ) & age < 12

For 60+ such variables I would write a loop.

gen byte event = 0 

foreach v of var dx* { 
    display "`v' " _c 
    replace event = 1 if inlist(`v', "G40", "G41") & age < 12 
}

where for purposes of debugging, or just understanding, the output is noisier than would be customary once the operations seem routine. A standard trick with inlist() is to note that a test of the form foo == whatever is the same as a test of whatever == foo so there is often a choice about which argument is first and which other argument(s) follow.

edited Jun 23, 2020 at 6:43

answered Jun 23, 2020 at 5:13

Nick Cox

37.4k6 gold badges37 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Create variable based on value in multiple columns?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related