70

I am trying to get the following script to work. The input file consists of 3 columns: gene association type, gene name, and disease name.

cols = ['Gene type', 'Gene name', 'Disorder name']
no_headers = pd.read_csv('orphanet_infoneeded.csv', sep=',',header=None,names=cols)

gene_type = no_headers.iloc[1:,[0]]
gene_name = no_headers.iloc[1:,[1]]
disease_name = no_headers.iloc[1:,[2]]

query = 'Disease-causing germline mutation(s) in' ###add query as required

orph_dict = {}

for x in gene_name:
    if gene_name[x] in orph_dict:
        if gene_type[x] == query:
            orph_dict[gene_name[x]]=+ 1
        else:
            pass
    else:
        orph_dict[gene_name[x]] = 0

I keep getting an error that says:

Series objects are mutable and cannot be hashed

Any help would be dearly appreciated!

1
  • 2
    show us the full traceback so we can see the line on which the error is being thrown. my guess is it's orph_dict[gene_name[x]] = 0. the traceback would also show us the class of error being thrown. Commented Apr 17, 2015 at 13:50

2 Answers 2

39

Shortly: gene_name[x] is a mutable object so it cannot be hashed. To use an object as a key in a dictionary, python needs to use its hash value, and that's why you get an error.

Further explanation:

Mutable objects are objects which value can be changed. For example, list is a mutable object, since you can append to it. int is an immutable object, because you can't change it. When you do:

a = 5;
a = 3;

You don't change the value of a, you create a new object and make a point to its value.

Mutable objects cannot be hashed. See this answer.

To solve your problem, you should use immutable objects as keys in your dictionary. For example: tuple, string, int.

Sign up to request clarification or add additional context in comments.

Comments

13
gene_name = no_headers.iloc[1:,[1]]

This creates a DataFrame because you passed a list of columns (single, but still a list). When you later do this:

gene_name[x]

you now have a Series object with a single value. You can't hash the Series.

The solution is to create Series from the start.

gene_type = no_headers.iloc[1:,0]
gene_name = no_headers.iloc[1:,1]
disease_name = no_headers.iloc[1:,2]

Also, where you have orph_dict[gene_name[x]] =+ 1, I'm guessing that's a typo and you really mean orph_dict[gene_name[x]] += 1 to increment the counter.

3 Comments

How could I apply this technique of creating the Series from the start when I am splitting into a training and testing dataset? X_train, X_test, y_train, y_test = train_test_split(training_feature_set, training_feature_label, test_size = 0.1, random_state=42) @stackoverflow.com/users/639792/jkitchen
@Alvis, if your function returns DataFrames, you can still select individual items from those. Read the docs for indexing. .loc or .iloc are probably what you want.
Thank you @jkitchen I'll check out the documentation :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.