Parsing output (text) into dataframe with columns and rows

Question

Complete newby here. I'm trying to parse a simulation output with Python into a frame and write it into an excel sheet.

The output looks like this:

F100.T,557.9567856878748,F,F,F,F,
F100.Tv,557.9567856878748,F,F,F,F,
F100.Tl,557.9567856878748,F,F,F,F,
F100.Duty,-106382.60618934222,F,F,F,T,1
... 
F200.T,557.9567856878748,F,F,F,F,
F200.Tv,557.9567856878748,F,F,F,F,
F200.Tl,557.9567856878748,F,F,F,F,
F200.Duty,-37798.28473117316,F,F,F,T,1
... and so on

How it should look like at the end:

    | F100               | F200           |
    | -------------------|----------------| 
 T  | 557.9567856878748  | 100            |
 Tv | 557.9567856878748  | 5.550847203    |
 T1 |-106382.60618934222 | 3.798721561    |

... and so on

What I've tried so far:

import itertools
import pandas as pd


def read_lines(file_object) -> list:
    return [
        parse_line(line) for line in file_object.readlines() if line.strip()
    ]


def parse_line(line: str) -> list:
    return [
        i.split(",")[1]
        for i in line.strip().split()
        if i.startswith(("F100", "F200"))
    ]


def flatten(parsed_lines: list) -> list:
    return list(itertools.chain.from_iterable(parsed_lines))


def cut_into_pieces(flattened_lines: list, piece_size: int = 2) -> list:
    return [
        flattened_lines[i:i + piece_size] for i
        in range(0, len(flattened_lines), piece_size)
    ]


with open("sim.txt") as data:
    df = pd.DataFrame(
        cut_into_pieces(flatten(read_lines(data))),
        columns=["F100", "F200"],
    )
    print(df)
    df.to_excel("table.xlsx", index=False)

So far, the result looks like this:

                    F100                 F200
1      557.9567856878748    100
2      557.9567856878748    5.550847203
3    -106382.60618934222    3.798721561
..                   ...                  ...

As you see, the rows are not named (T, Tv, T1 etc.). Im hitting a wall and don't know how to continue from here.
Thanks in advance

Please accept my answer if it answers your question @MaJak

tozCSS
– tozCSS

2021-11-07 18:31:14 +00:00
Commented Nov 7, 2021 at 18:31 — tozCSS
– tozCSS, Commented Nov 7, 2021 at 18:31

tozCSS · Accepted Answer · 2021-11-07 18:34:21Z

1

import pandas as pd
from io import StringIO
txt = """\
F100.T,557.9567856878748,F,F,F,F,
F100.Tv,557.9567856878748,F,F,F,F,
F100.Tl,557.9567856878748,F,F,F,F,
F100.Duty,-106382.60618934222,F,F,F,T
F200.T,557.9567856878748,F,F,F,F,
F200.Tv,557.9567856878748,F,F,F,F,
F200.Tl,557.9567856878748,F,F,F,F,
F200.Duty,-37798.28473117316,F,F,F,T
"""
df = pd.read_csv(StringIO(txt),header=None)
# create two new columns from the first column
df[['a','b']] = df[0].str.split('.',expand=True)
df.pivot('b','a',1)
# note: if index/column values are not unique
# then you can aggregate them like this:
# df.pivot_table(index='b',columns='a',values=1, aggfunc='mean')

	F100	F200
Duty	-106383	-37798.3
T	557.957	557.957
Tl	557.957	557.957
Tv	557.957	557.957

edited Nov 7, 2021 at 18:34

answered Nov 7, 2021 at 15:50

tozCSS

6,2343 gold badges37 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing output (text) into dataframe with columns and rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related