2

I have a dataset which looks like this:

Account Number  6m      7m      8m      9m      10m     11m
1               Better  X < 10  X < 10  Better  X < 30  X < 30
2               X < 10  X < 20  X < 30  X < 20  X < 20  X < 20
3               Better  Better  Better  Better  X < 10  X < 20
4               X < 10  Better  Same    Same    Same    Same
5               Same    Better  Same    Same    Same    Same
6               Same    Same    Same    Better  Better  Better
7               Same    X < 10  X < 10  X < 10  X < 10  Better
8               Better  Better  Better  Better  Better  Better
9               X < 10  X < 10  X < 10  X < 20  X < 30  Better
10              X < 20  X < 30  X < 30  X < 30  X < 30  X < 30

Where each cell tells me what's happened 6-11 months later for each account number. I want to turn this into a dataset that I can create graphs etc from, so would like to transpose it to look like this:

Result  6m  7m  8m  9m  10m 11m
X < 10  3   3   3   1   2   0
X < 20  1   1   0   2   1   2
X < 30  0   1   1   1   2   1
Same    3   1   3   2   2   2
Better  1   2   1   2   2   4

Even better if there was a way to turn the count into a % for each column.

data have;
    infile datalines dlm='|';
    input "Account Number"n "6m"n$ "7m"n$ "8m"n$ "9m"n$ "10m"n$ "11m"n$;
    datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20
3|Better|Better|Better|Better|X < 10|X < 20
4|X < 10|Better|Same|Same|Same|Same
5|Same|Better|Same|Same|Same|Same
6|Same|Same|Same|Better|Better|Better
7|Same|X < 10|X < 10|X < 10|X < 10|Better
8|Better|Better|Better|Better|Better|Better
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30
;
run;
1
  • 1
    The counts in your second table do not match the values in your input. For example the value BETTER appears 4 times in the "7m" column, not 3 times. Commented May 10, 2024 at 18:07

4 Answers 4

2

You are actually look for a way to count these variables, not transpose. You can use direct addressing skill:

data want;
  set have end=eof;

  *Allocating;
  array _par_[6] "6m"n--"11m"n;
  array _lab_[5]$ _temporary_('X < 10','X < 20','X < 30','Same','Better');
  array _cnt_[5,6]_temporary_;

  *Computing;
  do i=1 to dim(_par_);
    _cnt_[whichc(_par_[i],of _lab_[*]),i]+1;
  end;

  *Restructing;
  if eof then do i=1 to dim(_lab_);
    result=_lab_[i];
    do j=1 to dim(_par_);
      _par_[j]=cats(max(_cnt_[i,j],0));
    end;
    output;
  end;
  drop i j "Account Number"n;
run;

This is want dataset looks like:

Obs    result    6m    7m    8m    9m    10m    11m
1      X < 10    3     3     3     1      2      0
2      X < 20    1     1     0     2      1      2
3      X < 30    0     1     2     1      3      2
4      Same      3     1     3     2      2      2
5      Better    3     4     2     4      2      4
Sign up to request clarification or add additional context in comments.

Comments

2

Are you asking for something like this?

First just read the data into a normalized structure to start with.

data have;
  infile datalines dsd dlm='|' truncover;
  input Account @;
  do month=6 to 11;
    input status $ @ ;
    output;
  end;
datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20
3|Better|Better|Better|Better|X < 10|X < 20
4|X < 10|Better|Same|Same|Same|Same
5|Same|Better|Same|Same|Same|Same
6|Same|Same|Same|Better|Better|Better
7|Same|X < 10|X < 10|X < 10|X < 10|Better
8|Better|Better|Better|Better|Better|Better
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30
;

Then it is easier to use to do analysis or graphics.

Example:

proc sgplot data=have;
  vbar month /group=status stat=percent seglabel;
run;

enter image description here

1 Comment

Talent thought!
1

First, stack the data so we can do some counting:

data stack;
    set have;
    array charvars[*] _CHARACTER_;

    do i = 1 to dim(charvars);
        result = charvars[i];
        var    = vname(charvars[i]);
        output;
    end;

    keep result var;
run;

This gets you:

result  var
Better  6m
X < 10  7m
X < 10  8m
Better  9m
X < 30  10m
X < 30  11m
...     ...

I am certain with this data you can do something really cool with proc report, but that's not an area I know particularly well. Instead, we'll create the dataset in a few other steps.

We can collapse this and count the number of values within each result, var combination, then calculate a percentage of each var within that:

proc sql;
    create table pct as
        select result, var, total, total / sum(total) as pct format=percent8.1
            from (select result, var, count(*) as total
                  from stack
                  group by result, var
                 )
            group by var
            order by result, var
    ;
quit;

Which gets us this:

result  var total pct
Better  10m 2     20.0%
Better  11m 4     40.0%
Better  6m  3     30.0%
Better  7m  4     40.0%
Better  8m  2     20.0%
Better  9m  4     40.0%
...     ... ... ...

Now we have everything we need to transpose it into the format that we want. The id statement in proc transpose will allow us to use var as the name of each transposed column. We'll do this by result.

proc transpose data=pct out=pct_tpose(drop=_NAME_);
    by result;
    id var;
    var pct;
run;

Which gets us almost what we want:

result  10m     11m     6m       7m     8m      9m
Better  20.0%   40.0%   30.0%   40.0%   20.0%   40.0%
Same    20.0%   20.0%   30.0%   10.0%   30.0%   20.0%
X < 10  20.0%   .       30.0%   30.0%   30.0%   10.0%
X < 20  10.0%   20.0%   10.0%   10.0%   .       20.0%
X < 30  30.0%   20.0%   .       10.0%   20.0%   10.0%

Now we just need to clean it on up by:

  1. Filling in missing values with 0
  2. Reordering columns to the desired order
  3. Reordering result to the desired order
/* Replace missing with 0 */
proc stdize data=pct_tpose 
            out=want 
            missing=0 
            reponly;
run;

/* Fix sort order */
data want_sorted;
    
    /* Set variable order */
    length Result $10.
           "6m"n "7m"n "8m"n "9m"n "10m"n "11m"n 8.
    ;

    set want;
    
    select(result);
        when('X < 10') order = 1;
        when('X < 20') order = 2;
        when('X < 30') order = 3;
        when('Same')   order = 4;
        otherwise      order = 5;
    end;
run;

proc sort data=want_sorted out=want_sorted_final(drop=order);
    by order;
run;

Which gets us our final result that we want:

Result  6m      7m      8m      9m      10m     11m
X < 10  30.0%   30.0%   30.0%   10.0%   20.0%   0.0%
X < 20  10.0%   10.0%   0.0%    20.0%   10.0%   20.0%
X < 30  0.0%    10.0%   20.0%   10.0%   30.0%   20.0%
Same    30.0%   10.0%   30.0%   20.0%   20.0%   20.0%
Better  30.0%   40.0%   20.0%   40.0%   20.0%   40.0%

1 Comment

Thanks for this, it's exactly what I needed. I've also learnt a lot by the way you've set it out. Thanks again :)
1

This is fairly dynamic and gets you pretty close but you may need to re-order columns.

Transpose to a long format Utilize PROC FREQ + SPARSE option to get your counts + percentages Transpose to a wide format (if needed, not sure that's actually needed for graphing)

data have;
    infile datalines dlm='|';
    input "Account Number"n "6m"n$ "7m"n$ "8m"n$ "9m"n$ "10m"n$ "11m"n$;
    datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20
3|Better|Better|Better|Better|X < 10|X < 20
4|X < 10|Better|Same|Same|Same|Same
5|Same|Better|Same|Same|Same|Same
6|Same|Same|Same|Better|Better|Better
7|Same|X < 10|X < 10|X < 10|X < 10|Better
8|Better|Better|Better|Better|Better|Better
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30
;
run;

proc transpose data=have out=long prefix=Result;
by "Account Number"n;
var "6m"n--"11m"n;
run;

proc freq data=long;
table result1*_name_ / out=want outpct sparse;
run;

proc transpose data=want out=want_count;
by result1;
id _name_;
var count;
run;

proc transpose data=want out=want_pct;
by result1;
id _name_;
var pct_col;
run;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.