1

I have a file that looks like this:

1,var1
2,var2
3,var3
4,var1_val1
5,var2_val2
6,var1_val2
7,var3_val1
8,var2_val1
9,var3_val2

Output file should look like:

var1 1 4 6 
var2 2 8 5
var3 3 7 9

My code is quite complicated. It works, but it's very inefficient. Can this be done more efficiently:

def findv(var):
    with open(inputfile) as f:
        for line in f:
            elems=line.split(',')
            name=elems[0]
            if var!=name:
                continue
            field=elems[0]
        f.seek(0)
        for line in f:
            elems2=line.split(',')
            if elems2[1].endswith(var+'_val1'):
                first=elems2[0]
        f.seek(0)
        for line in f:
            elems3=line.split(',')
            if elems3[1].endswith(var+'_val3'):
                second=elems3[0]
    return var,field,first,second

main part of the code:

with open(inputfile) as f:
    with open(outputfile) as fout:
        for line in f:
            tmp=line.split(',')
        if current[1].endswith('val1') or current[1].endswith('val2'):
            continue
        v=tmp[1]
        result=findv(v)
        f2.write(result)

My function findv(var) is called each time a line in input file starts with varx and then searches through the file multiple times until it finds fields that correspond to varx_val1 and varx_val2.

EDIT: I need to preserve the order of the input file, so var1 has to appear first in the output file, then var2, then var3 etc.

2 Answers 2

4

Use a dictionary, with the keys being your labels and a list to store your values. This way, you only have to loop over your file once.

from collections import defaultdict

results = defaultdict(list)

with open('somefile.txt') as f:
   for line in f:
      if line.strip():
         value, key = line.split(',')
         if '_' in key:
             key = key.split('_')[0] # returns var1 from var1_val1
         results[key].append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

Here is a version that includes the below comments:

from collections import OrderedDict

results = OrderedDict

with open('somefile.txt') as f:
   for line in f:
      line = line.strip()
      if line:
         value, key = line.split(',')
         key = key.split('_')[0] # returns var1 from var1_val1
         results.setdefault(key, []).append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))
Sign up to request clarification or add additional context in comments.

8 Comments

I clarified my question. I need to preserve the order of input file, so the output file has to be in order var1, var2,var3
@Anastasia: Then make results be an OrderedDict. Change results[key].append(value) to results.setdefault(key, []).append(value).
Also, the names of variables are words, they don't end with a number, so I can't simply re-order the file based on the numerical value.
No need to guard key = key.split('_')[0] with if '_' in key: because "nounderscore" == "nounderscore".split('_')[0].
@Anastasia: No need to sort the result. collections.OrderedDict preserves the insertion order of keys.
|
0

I have written a python program that iterates over the file only once, reads all the important data into a dict, and then writes the dict into the output file.

#!/usr/bin/env python3
import collections

output = collections.OrderedDict()

with open(inputfile, 'r') as infile:
    for line in infile:
        dat, tmp = line.strip().split(',')
        if '_val' in tmp:
            key, idxstr = tmp.split('_val')
            idx = int(idxstr)
        else:
            key = tmp
            idx = 0
        output.setdefault(key, ["", "", ""])[idx] = dat

with open(outoutfile, 'w') as outfile:
    for var in output:
        v = output[var]
        outfile.write('{} {}\n'.format(var, ' '.join(v)))

Update: modified according to comments

1 Comment

Don't use naked excepts. Use except ValueError. No need to specify 'r' mode, since this is the default for open(). Use dict.setdefault() to assign to keys that may have missing values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.