1

I want to process a tab-delimited input data file with header and generate a tab-delimited output file according to a template.

Here is a small setting:

Data file:

A B C
1 4 7
2 5 8
3 6 9

Template file that defines columns in the output:

A:A
BC:B+C
HC:C/2, precision:2

The template file contains these operations: creation of a new column, summation and division operations on columns, and definition of precision of rational numbers in a column.

Output file:

A BC HC
1 11 3.50
2 13 4.00
3 15 4.50

Where can I start to write an interpreter in python? The interpreter will parse the template file, and then output data will be generated using the input data according to this parsed template file.

6
  • 1
    Look into the csv module Commented Aug 25, 2014 at 11:14
  • My answer below produces the correct output based on your rules, if the rules changes you can modify the template.txt to suit your needs. Good luck! Commented Aug 25, 2014 at 11:48
  • 1
    Some scientific packages can parse expressions. You might also want to take a look at ast as you might have need for much more complex expressions involving loops or so. That being said, since your in the way of writing some kind of "batch spreadsheet", maybe you should investigate how to wrap some existing library into your application. Take a look at libqalculate for example. It supports all you need and much more -- so you don't have to reinvent the wheel. Commented Aug 25, 2014 at 11:57
  • 1
    Isn't this the same as the question you asked earlier which was closed? Re-asking the same question is frowned upon. Commented Aug 25, 2014 at 12:01
  • Are +, '/', and precision the only possible operators in a template file? Are there any rules for the template grammar that are not shown in the example? Commented Aug 25, 2014 at 12:09

1 Answer 1

2

Updated for variable data.txt length

Maybe you should investigate using exec. This wold allow you to have actual python code in your template.

data.txt:

A B C D
1 4 7 2
2 5 8 5
3 6 9 8

template.txt:

headers = ['A', 'BC', 'HC', '3/D']
process = [ lambda params: int(params[0]), 
            lambda params: int(params[1]+params[2]),  
            lambda params: float('%.2f' % (params[2]/2)), 
            lambda params: float('%.2f' % (3. / params[3]))]

report_gen.py:

with open ("data.txt", "r") as myData:
    data = myData.readlines()

with open ("template.txt", "r") as myTemplate:
    template = myTemplate.read()

file = open("output.txt", "w")

exec(template)
for line in data:
    params = line.split(' ')
    if (params[0].isdigit()):
        for i in range(len(params)):
            params[i] = float(params[i])
        results = [None] * len(headers) #headers from eval'd template
        for i in range(len(headers)):
            # this is where we call our lambdas
            # which will calculate the colums based on the data for this row
            results[i] = str(process[i](params))
        file.write(" ".join(results) + "\n");
    else:
        file.write(" ".join(headers) + "\n")
file.close()

Now, python report_get.py in the directory with both data.txt and template.txt will generate:

output.txt:

A BC HC 3/D
1 11 3.5 1.5
2 13 4.0 0.6
3 15 4.5 0.38
Sign up to request clarification or add additional context in comments.

2 Comments

Dear @AlexanderBrevig, this is a great solution. However, I cannot figure out how to make easier addition of a new column without changing report_gen.py and with minimal change in template.txt. There are hundreds of columns in the real input file, and there are tens of new columns, which must be added each week.
I've updated the code to allow for variable input row length, this will probably make your lambdas a bit harder to read, but you could also define constans for rows so they read better instead of params[2] for getting C

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.