I have a couple of files of the same format which I need to filter based on certain threshold based on three columns from those data frames.And in the end I need to save them as separate results
The example dataframe looks like follows,
ID Mean log2FoldChange SE stat pvalue padj
0 ENSG2 0.737466 -0.434579 0.484389 -0.897170 0.369628 0.607709
1 ENSG32 321.467787 -0.405760 0.170955 -2.373484 0.017621 0.097636
2 ENSG85 0.000000 NaN NaN NaN NaN NaN
And when I try to run the following function which I defined to use to filter and extract a subset from the dataframe and save it
def DEfilter(df):
Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
#Frames = [Up_regulted,Down_regulated]
DE = pd.concat(Up_regulted,Down_regulated)
return df
and when I try to apply it on one of the dataframes,
Patient_pairs.apply(DEfilter,axis=1)
Its throwing me following error,
AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')
This is so far what I tried to get the filtered results saved as new file,
path = '/home/pathtofile'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]
for filename in results:
name = os.path.basename(os.path.normpath(filename))
df = pd.read_csv(filename, sep=sep, header=0)
Up = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
DE = pd.concat(Up,Down)
DE.to_csv('Filtered_set_' + name, sep='\t',index=False)
Any help/suggestions would be great
filteredDf = DEfilter(Patient_Pairs), assuming you really mean to returnDEand notdf