I'm trying to create lists from multiple opened files, having some issues. I need to create two separate lists for each file, right now my code only creates two lists for the last file iterated. Suggestions to fix, and create unique 'sample_genes' and 'sample_values' for each file in 'file_list'?
Alternatively, creating a single unified list for 'gene_names' from all files and 'sample_values' from all files would work as well.
# Parse csv files for samples, creating lists of gene names and expression values.
file_list = ['CRPC_278.csv', 'PCaP_470.csv', 'CRPC_543.csv', 'PCaN_5934.csv', 'PCaN_6102.csv', 'PCaP_17163.csv']
des_list = ['a', 'b', 'c', 'd', 'e', 'f']
for idx, (f_in, des) in enumerate(zip(file_list, des_list)):
with open(f_in) as des:
cread = list(csv.reader(des, delimiter = '\t'))
sample_genes = [i for i, j in (sorted([x for x in {i: float(j)
for i, j in cread}.items()], key = lambda v: v[1]))]
sample_values = [j for i, j in (sorted([x for x in {i: float(j)
for i, j in cread}.items()], key = lambda v: v[1]))]
# Compute row means.
mean_values = [((a + b + c + d + e + f)/len(file_list)) for i, (a, b, c, d, e, f) in enumerate(zip(sample_1_values, sample_2_values, sample_3_values, sample_4_values, sample_5_values, sample_6_values))]
# Provide proper gene names for mean values and replace original data values by corresponding means.
sample_genes_list = [i for i in sample_1_genes, sample_2_genes, sample_3_genes, sample_4_genes, sample_5_genes, sample_6_genes]
sample_final_list = [sorted(zip(sg, mean_values)) for sg in sample_genes_list]
The new code below:
# Parse csv files for samples, creating lists of gene names and expression values.
file_list = ['CRPC_278.csv', 'PCaP_470.csv', 'CRPC_543.csv', 'PCaN_5934.csv', 'PCaN_6102.csv', 'PCaP_17163.csv']
full_dict = {}
for path in file_list:
with open(path) as stream:
data = list(csv.reader(stream, delimiter = '\t'))
data = sorted([(i, float(j)) for i, j in data], key = lambda v: v[1])
sample_genes = [i for i, j in data]
sample_values = [j for i, j in data]
full_dict[path] = (sample_genes, sample_values)
Results from unpacking the dictionaries within the dictionary shows some deep nested structure:
for key in full_dict:
value = full_dict[key]
for key in full_dict[key]:
for idx, items in enumerate(key):
print idx
desvariable name: you use it twice in the loop scope, first to unpack the zipped list, and next time to the opened file object. Also: don't usefileas variable name, since it is a built-in phrase in python. And you also have a syntax problem:{i: float(j)...— what is the curly bracket for? and the:? What are you trying to do?{a:b for ...}is a dictionary comprehension, e.g.,{i:i for i in range(3)}produces{0: 0, 1: 1, 2: 2}. Of course the use in the OP's question is pretty odd....[x for x in dictcomp.items()]where dictcomp is{i: float(j) for i, j in ...}.