1

I have a data frame with 99 columns for dx1-dx99 & 99 for px1-px99 and one column as mort:

dx1 dx2 dx3 .   dx99    px1 px2 .   px99    mort
E10 I12 E10 N18 R18     0FY 0TY 0DN 0DN      1
E10 I12 I31 E44 N17     0FY 0TY 0FT 5A1      0
E10 I12 N17 T86 T86     0TY 0FY 0DT          0
I12 E10 N18 A04         0TY 0FY 0DT 0T7      1
E10 I12 E10 N18 Z99     0TY 0FY              0
E10 N18 Z76             0FY 0TY 04Q 0D1      1
E10 N18 Z99 N25 E78     0TY 0FY 0WP          0

I want to keep all values in dx-dx99 & px-px99 where in matching rows the value of mort=1, otherwise set them to zero. After that count the frequencies of occurrences of remaining codes.

I tried this:

dx = df.loc[:,'dx1':'dx99']
X1pr = df.loc[:,'px1':'px99']
dx = dx.fillna(0)    
X1p = X1pr.fillna(0)
death = df.loc[:,'mort']
df1 = pd.concat([dx, X1p, death], axis=1)

N = len(df1.columns)
keep = df1.iloc[:,-(N-1):].isin(["1"]).values

df1.iloc[:,:N-1] = df1.iloc[:,:N-1].where(keep, 0)
X1d = df1.[df1.columns[0:N-1]]

mat = X1d.as_matrix(columns=None)
values, counts = np.unique(mat.astype(str), return_counts=True)
matrix = []
for v,c in zip(values, counts):
    matrix.append( [v,c])

icd9_counted_d = pd.DataFrame(matrix, columns = ['ICD_code', 'DEATHS'])

I am getting nothing in DEATHS column. Any idea?

1
  • can you post your desired data set? Commented Apr 10, 2017 at 18:20

1 Answer 1

1

IIUC:

In [31]: x.loc[x.mort != 1, x.columns != 'mort'] = ''

In [32]: x
Out[32]:
   dx1  dx2  dx3  dx4 dx99  px1  px2  px3 px99  mort
0  E10  I12  E10  N18  R18  0FY  0TY  0DN  0DN     1
1                                                  0
2                                                  0
3  I12  E10  N18  A04  NaN  0TY  0FY  0DT  0T7     1
4                                                  0
5  E10  N18  Z76  NaN  NaN  0FY  0TY  04Q  0D1     1
6                                                  0
Sign up to request clarification or add additional context in comments.

4 Comments

Even if I converted mort to numeric type, It doesn't work for me. I get all empty columns, other than mort, after this operation.
@Sanoj, this is how i understood your question. Please read how to make good reproducible pandas examples and update your question accordingly.
I appreciate your answer. In my case 'mort' was coming as dtype 'object'. I thought that x.mort != 1 condition is failing. Therefore I converted x.mort to numeric type using convert_object function. I can see that it got converted to numeric dtype. Still condition x.mort != 1 fails and I am not getting row 0, 3, 5, with codes, as you have shown in your example above. I am getting all empty.
@Sanoj, if mort is of object dtype, you can simply use: x.mort != '1' as a condition - it's not a big deal. But the question is whether the output in my answer is your desired data set or not?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.