0

I have a CSV file that is formatted like below.

@QWERTY
@Equipment01
@Datetime;A;B;C;D
21/02/2005 17:55;23;451;42;31;
21/02/2005 17:50;24;143;24;54;
21/02/2005 17:45;25;513;31;31;
@Equipment02
@Datetime;A;B;C;D
21/02/2005 17:55;43;1;42;58;
21/02/2005 17:50;14;3;65;51;
21/02/2005 17:45;3;3;91;53;
21/02/2005 17:40;31;35;13;31;
21/02/2005 17:35;34;54;61;5;
@PersonalGear01
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;
@Equipment00
@Datetime;A;B;C;D
@PersonalGear02
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;

Each equipment and personal gear will have delimiter datetime data rows. In some cases, there may be no datetime data row (e.g @Equipment00). The number of datetime entries recorded may vary (e.g @Equipment02 has more datetime entries than @Equipment01).

I will like to create multiple dataframes, based on the equipment and personal gears. The expected results based on the above example will be 4 dataframes (@Equipment01, @Equipment02, @PersonalGear01, @Equipment00).

Is there a pandas way of doing this?

0

1 Answer 1

2

You can use:

dfs = {}
with open('data.dat') as fp:
    next(fp)  # skip first line
    data = []
    name = next(fp)[1:].strip()
    for row in fp:
        # Parse column names
        if row.startswith('@'):
            headers = row[1:].strip().split(';')
        # Accumulate data
        else:
            while not row.startswith('@'):
                data.append(row.strip().split(';'))
                row = next(fp)
            dfs[name] = pd.DataFrame(data, columns=headers)
            data = []
            name = row[1:].strip()
    dfs[name] = pd.DataFrame(data, columns=headers)

Output:

>>> dfs
{'Equipment01':            Datetime   A    B   C   D
 0  21/02/2005 17:55  23  451  42  31
 1  21/02/2005 17:50  24  143  24  54
 2  21/02/2005 17:45  25  513  31  31,
 'Equipment02':            Datetime   A   B   C   D
 0  21/02/2005 17:55  43   1  42  58
 1  21/02/2005 17:50  14   3  65  51
 2  21/02/2005 17:45   3   3  91  53
 3  21/02/2005 17:40  31  35  13  31
 4  21/02/2005 17:35  34  54  61   5,
 'PersonalGear01':            Datetime   A   B   C   D  E  F
 0  21/02/2005 17:55  41  23   2  16  0  6
 1  21/02/2005 17:50   3  95  51  14  0  6
 2  21/02/2005 17:45   3   2  91  53  0  6,
 'Equipment00': Empty DataFrame
 Columns: [Datetime, A, B, C, D]
 Index: []}

>>> dfs.keys()
dict_keys(['Equipment01', 'Equipment02', 'PersonalGear01', 'Equipment00'])

>>> dfs['Equipment02']
           Datetime   A   B   C   D
0  21/02/2005 17:55  43   1  42  58
1  21/02/2005 17:50  14   3  65  51
2  21/02/2005 17:45   3   3  91  53
3  21/02/2005 17:40  31  35  13  31
4  21/02/2005 17:35  34  54  61   5
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your reply Corralien. I seem to be getting an error when using it in my original CSV - ValueError: 32 columns passed, passed data had 33 columns. Appeared to happen on line 15 of the code.
Can you share your real data?
It appeared that each datetime row has a semicolon at the end. This caused the ValueError. I have modified my post sample code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.