6

I have a voting dataset like that:

republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y

but they are both string so I want to change them to integer matrix and make statistic hou_dat = pd.read_csv("house.data", header=None)

for i in range (0, hou_dat.shape[0]):
    for j in range (0, hou_dat.shape[1]):
        if hou_dat[i, j] == "republican":
            hou_dat[i, j] = 2
        if hou_dat[i, j] == "democrat":
            hou_dat[i, j] = 3
        if hou_dat[i, j] == "y":
            hou_dat[i, j] = 1
        if hou_dat[i, j] == "n":
            hou_dat[i, j] = 0
        if hou_dat[i, j] == "?":
            hou_dat[i, j] = -1

hou_sta = hou_dat.apply(pd.value_counts)
print(hou_sta)

however, it shows error, how to solve it?:

Exception has occurred: KeyError
(0, 0)

2 Answers 2

4

IIUC, you need map and stack

map_dict = {'republican' : 2,
           'democrat' : 3,
           'y' : 1,
           'n' : 0,
           '?' : -1}

df1 = df.stack().map(map_dict).unstack()

print(df1)

   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16
0   2   0   1   0   1   1   1   0   0   0   1  -1   1   1   1   0   1
1   2   0   1   0   1   1   1   0   0   0   0   0   1   1   1   0  -1
2   3  -1   1   1  -1   1   1   0   0   0   0   1   0   1   1   0   0
3   3   0   1   1   0  -1   1   0   0   0   0   1   0   1   0   0   1
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks again, but where do I add path?
hou_dat = pd.read_clipboard("house.data",sep=" ", header=None) is not correct
@4daJKong you can ignore that, that was only to reproduce your data from above.
but my data from a dataset, name "house.data", how to import that?
hou_dat = pd.read_csv("house.data", header=None) you mean adding pd.read_clipboard directly? under this?
|
0

If you're dealing with data from csv, it is better to use pandas' methods. In this case, you have replace method to do exactly what you asked for.

hou_dat.replace(to_replace={'republican':2, 'democrat':3, 'y':1, 'n':0, '?':-1}, inplace=True)

You can read more about it in this documentation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.