0

I have a couple of files of the same format which I need to filter based on certain threshold based on three columns from those data frames.And in the end I need to save them as separate results

The example dataframe looks like follows,

    ID  Mean    log2FoldChange  SE  stat    pvalue  padj
0   ENSG2   0.737466    -0.434579   0.484389    -0.897170   0.369628    0.607709
1   ENSG32  321.467787  -0.405760   0.170955    -2.373484   0.017621    0.097636
2   ENSG85  0.000000    NaN NaN NaN NaN NaN

And when I try to run the following function which I defined to use to filter and extract a subset from the dataframe and save it

def DEfilter(df):
    Up_regulted    = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    #Frames         = [Up_regulted,Down_regulated]
    DE             = pd.concat(Up_regulted,Down_regulated)
    return df

and when I try to apply it on one of the dataframes,

Patient_pairs.apply(DEfilter,axis=1)

Its throwing me following error,

 AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')

This is so far what I tried to get the filtered results saved as new file,

     path       = '/home/pathtofile' 
        files      = os.listdir(path)

        results        = [os.path.join(path,i) for i in files if i.startswith('DE')]

    for filename in results:
        name       = os.path.basename(os.path.normpath(filename))
        df         = pd.read_csv(filename, sep=sep, header=0)
        Up         = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
        Down       = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')   
        DE         = pd.concat(Up,Down)
        DE.to_csv('Filtered_set_' + name, sep='\t',index=False)

Any help/suggestions would be great

1
  • Could you provide what you expect the output of your sample data to be? Also, it looks to me like what you are really trying to do is: filteredDf = DEfilter(Patient_Pairs), assuming you really mean to return DE and not df Commented Jun 22, 2016 at 16:59

1 Answer 1

2

You are attempting to run a data frame level operation on series level method. Do not pass the function in DataFrame.apply (which applies a function on either the rows or columns of a dataframe). Simply call the function as is and pass the whole data frame as a parameter:

path = '/home/pathtofile' 
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]

def DEfilter(df):
    Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    DE = pd.concat([Up_regulted, Down_regulated])
    return DE

for filename in results:
     df = pd.read_csv(filename, sep=sep, header=0)
     DE = DEfilter(df)

     name = os.path.basename(os.path.normpath(filename))
     DE.to_csv('Filtered_set_' + name, sep='\t',index=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.