How do I conditionally select variables in PROC SQL?

Question

I have calculated a frequency table in a previous step. Excerpt below:

I want to automatically drop all variables from this table where the frequency is missing. In the excerpt above, that would mean the variables "Exkl_UtgUtl_Taxi_kvot" and "Exkl_UtgUtl_Driv_kvot" would need to be dropped.

I try the following step in PROC SQL (which ideally I will repeat for all variables in the table):

PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot ELSE NULL END)
FROM  stickprovsstorlekar;
quit;

This fails, however, since SAS does not like NULL values. How do I do this?

I tried just writing:

PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot)
FROM  stickprovsstorlekar;
quit;

But that just generates a variable with an automatically generated name (like DATA_007). I want all variables containing missing values to be totally excluded from the results.

Please show example input and output data. "frequency table" is not enough of a description to understand what your data looks like and how to determine which variables to exclude. — Tom
– Tom, Commented Sep 29, 2022 at 13:31
Like so? Checking out for the day, but thanks for the feedback! — Magnus
– Magnus, Commented Sep 29, 2022 at 13:43
Better, but it is very hard to code from photographs of data. — Tom
– Tom, Commented Sep 29, 2022 at 14:28
How exactly would I go about creating a repex in SAS? I've mostly coded in R before — Magnus
– Magnus, Commented Sep 30, 2022 at 6:13
Just post the data step that creates the data from in-line data. data have; input var1 var2 .... — Tom
– Tom, Commented Sep 30, 2022 at 12:50

Stu Sztukowski · Accepted Answer · 2022-09-29 14:48:11Z

1

Let's say you have 10 variables, where var1, var3, var5, var7, and var9 have missing values in the first observation. We want to select only the variables with no missing observations.

var1    var2    var3    var4    var5    var6    var7    var8    var9    var10
.      8       .       9       .       6       .       1       .       4
5      1       2       7       2       7       2       9       7       7
5      9       7       7       6       8       5       6       4       9
...

First, let's find all variables that have missing observations:

proc means data=have noprint;
    var _NUMERIC_;
    output out=missing nmiss=;
run;

Then transpose this output table so it's easier to work with:

proc transpose data=missing out=missing_tpose;
run;

We now have a table that looks like this:

_NAME_  COL1
_TYPE_  0
_FREQ_  10
var1    1
var2    0
var3    1
var4    0
var5    1
var6    0
var7    1
var8    0
var9    1
var10   0

When COL1 is > 0 and the name is not _TYPE_ or _FREQ_, that means the variable has missing values. Let's extract the name of the variable from _NAME_ into a comma-separated list.

proc sql noprint;
    select _NAME_
    into :vars separated by ','
    from missing_tpose
    where COL1 = 0 AND _NAME_ NOT IN('_TYPE_', '_FREQ_')
    ;
quit;

%put &vars and you'll see all of the non-missing values that can be passed into SQL.

var2,var4,var6,var8,var10

Now we have a dynamic way to select variables with only non-missing values.

proc sql;
    create table want as
        select &vars
        from have
    ;
quit;

edited Sep 29, 2022 at 14:48

answered Sep 29, 2022 at 14:37

Stu Sztukowski

13.1k1 gold badge16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tom Over a year ago

?? Why use ODS?? Why not just use the OUTPUT statement in PROC MEANS? output out=summary nmiss= ;

Stu Sztukowski Over a year ago

I dunno, something I learned from a tip I got back in the day. I updated it to use the output table instead.

Collectives™ on Stack Overflow

How do I conditionally select variables in PROC SQL?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related