0

I have calculated a frequency table in a previous step. Excerpt below:

enter image description here

I want to automatically drop all variables from this table where the frequency is missing. In the excerpt above, that would mean the variables "Exkl_UtgUtl_Taxi_kvot" and "Exkl_UtgUtl_Driv_kvot" would need to be dropped.

I try the following step in PROC SQL (which ideally I will repeat for all variables in the table):

PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot ELSE NULL END)
FROM  stickprovsstorlekar;
quit;

This fails, however, since SAS does not like NULL values. How do I do this?

I tried just writing:

PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot)
FROM  stickprovsstorlekar;
quit;

But that just generates a variable with an automatically generated name (like DATA_007). I want all variables containing missing values to be totally excluded from the results.

5
  • Please show example input and output data. "frequency table" is not enough of a description to understand what your data looks like and how to determine which variables to exclude. Commented Sep 29, 2022 at 13:31
  • Like so? Checking out for the day, but thanks for the feedback! Commented Sep 29, 2022 at 13:43
  • Better, but it is very hard to code from photographs of data. Commented Sep 29, 2022 at 14:28
  • How exactly would I go about creating a repex in SAS? I've mostly coded in R before Commented Sep 30, 2022 at 6:13
  • Just post the data step that creates the data from in-line data. data have; input var1 var2 .... Commented Sep 30, 2022 at 12:50

1 Answer 1

1

Let's say you have 10 variables, where var1, var3, var5, var7, and var9 have missing values in the first observation. We want to select only the variables with no missing observations.

var1    var2    var3    var4    var5    var6    var7    var8    var9    var10
.      8       .       9       .       6       .       1       .       4
5      1       2       7       2       7       2       9       7       7
5      9       7       7       6       8       5       6       4       9
...

First, let's find all variables that have missing observations:

proc means data=have noprint;
    var _NUMERIC_;
    output out=missing nmiss=;
run;

Then transpose this output table so it's easier to work with:

proc transpose data=missing out=missing_tpose;
run;

We now have a table that looks like this:

_NAME_  COL1
_TYPE_  0
_FREQ_  10
var1    1
var2    0
var3    1
var4    0
var5    1
var6    0
var7    1
var8    0
var9    1
var10   0

When COL1 is > 0 and the name is not _TYPE_ or _FREQ_, that means the variable has missing values. Let's extract the name of the variable from _NAME_ into a comma-separated list.

proc sql noprint;
    select _NAME_
    into :vars separated by ','
    from missing_tpose
    where COL1 = 0 AND _NAME_ NOT IN('_TYPE_', '_FREQ_')
    ;
quit;

%put &vars and you'll see all of the non-missing values that can be passed into SQL.

var2,var4,var6,var8,var10

Now we have a dynamic way to select variables with only non-missing values.

proc sql;
    create table want as
        select &vars
        from have
    ;
quit;
Sign up to request clarification or add additional context in comments.

2 Comments

?? Why use ODS?? Why not just use the OUTPUT statement in PROC MEANS? output out=summary nmiss= ;
I dunno, something I learned from a tip I got back in the day. I updated it to use the output table instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.