1

I have the following dataframe Positive Samples Dataframe loaded from a txt file using pandas

This positive samples dataframe has a column called Gene Set, which is basically a list of genes. When I run postive_samples["Gene Set"] I get the following output

['YAL004W', 'YLL024C'] ['YAL005C', 'YLL024C'] ['YAL005C', 'YMR006C'] ['YAL005C', 'YOL090W'] ['YAL009W', 'YBR074W'] ['YAL009W', 'YER162C'] ['YAL009W', 'YHL024W'] ['YAL009W', 'YJL187C'] ['YAL009W', 'YKR003W']

I also have another dataframe called new_expression_df New Expression Dataframe, which has positive_samples["Gene Set"] column as its index.

So What i am trying to do is get the values which are stored in postive_samples["Gene Set"] as they are and locate them using loc in the new_expression_df index using a loop.

samples_column_list= ["GSM144760","GSM144761","GSM144762","GSM144763","GSM144764"]

for gene_class_column in postive_samples[['Gene Set']]:
  #Select column contents by column name using [] operator
    geneSeriesObj = postive_samples[gene_class_column]
    gene_pairs = geneSeriesObj.values

#get gene pairs and locate their expression in the given samples    
for gene_pair in gene_pairs:
  new_expression_df.loc[gene_pair,samples_column_list]

I am getting a key Error at the beginning of the iteration when I try to do this using a loop, ideally I want to get each gene set as a list, locate its values in another data frame using its index.

But, when I plug in each set like below without using a loop it works just fine for the same values I get a Key Error for, so what am I doing wrong here?

new_expression_df.loc[['YAL002W','YBL001C'],samples_column_list]

I want to put the row argument of the loc function dynamically from another data frame column values which are stored in a list.

1 Answer 1

1

Can you print gene_pairs just before the second for loop?

Sign up to request clarification or add additional context in comments.

5 Comments

when I print gene_pairs I get the same output highlighted above, like below on each iteration. So each iteration outputs one list ['YAL004W', 'YLL024C'] ['YAL005C', 'YLL024C'] ['YAL005C', 'YMR006C'] ['YAL005C', 'YOL090W'] ['YAL009W', 'YBR074W'] ['YAL009W', 'YER162C'] ['YAL009W', 'YHL024W'] ['YAL009W', 'YJL187C'] ['YAL009W', 'YKR003W']
Can you tell me the output of the following- for gene_pair in gene_pairs: print(gene_pair) new_expression_df.loc[gene_pair,samples_column_list]
what that does is, it prints each pair. ['YAL004W', 'YLL024C'] is one output of print(gene_pair). A gene pair is the 2 elements of the array/list I gave above. So if I put it in a loop it outputs all the other pairs like I commented above but what I want to do is, put each output iteration in the loc function and execute it without getting the key error
Try using .at instead of .loc. Also can you pass the value to print method? for gene_pair in gene_pairs: print(new_expression_df.loc[gene_pair,samples_column_list])
I think .loc is better compared to .at, since I am trying to return values based on more than one row and column, but I gave it a shot though and its giving me a type error if I use .at because I am giving it a list The output is still the same, I get the key error even when I wrap that in a print statement. What I am wondering is how is it having a key error when using a loop but not having any errors when I plug all the values manually into the .loc function? It does not make any sense 🤷‍♂️. I am literally plugging in the same values with or without the loop😒

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.