0

I need to read from a SQL output file which has the following format to a python or Pandas dataframe, what could be the best possible approach?

-[ RECORD 1 ]--------------------------------
a    |    test
b    |    test
c    |    test
-[ RECORD 2 ]--------------------------------
a    |    test
b    |    test
c    |    test

1 Answer 1

1

This code will transform the input file into a "normal" csv - it isn't general purpose so since your example is probably artificial (you may not really have columns called a, b, c and values that are all test) there may be tweaks needed - but this is a start. I suppose it is inspired by sed so must be taken with a grain of salt!

1) transform the file into a regular csv file

def transform_to_csv(in_file_path, out_file_path):
    line = None
    column_names = []
    values = []
    first_record = True
    with open(in_file_path) as infile:
        with open (out_file_path, "w") as outfile:
            infile.readline() #skip first line
            while True:
                line = infile.readline().rstrip("\n")
                if not line:
                    # write the last record
                    outfile.write(",".join(values) + "\n")
                    break
                elif line.startswith("-"):
                    # finished with a record
                    if(first_record):
                        outfile.write(",".join(column_names) + "\n")
                        first_record = False
                    outfile.write(",".join(values) + "\n")
                    values = []
                else:
                    # accumulating fields for the next record
                    name, value = tuple(line.split("|"))
                    values.append(value.strip())
                    if(first_record):
                        column_names.append(name.strip())

We get a new file in csv format:

a,b,c
test,test,test
test,test,test

2) now do normal pandas stuff

import pandas as pd
infile = "in.txt"
outfile = "out.csv"
transform_to_csv(infile, outfile)
df = pd.read_csv("out.csv")
print(df.head())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.