Creating multiple dataframes from a single CSV

Question

I have a CSV file that is formatted like below.

@QWERTY
@Equipment01
@Datetime;A;B;C;D
21/02/2005 17:55;23;451;42;31;
21/02/2005 17:50;24;143;24;54;
21/02/2005 17:45;25;513;31;31;
@Equipment02
@Datetime;A;B;C;D
21/02/2005 17:55;43;1;42;58;
21/02/2005 17:50;14;3;65;51;
21/02/2005 17:45;3;3;91;53;
21/02/2005 17:40;31;35;13;31;
21/02/2005 17:35;34;54;61;5;
@PersonalGear01
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;
@Equipment00
@Datetime;A;B;C;D
@PersonalGear02
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;

Each equipment and personal gear will have delimiter datetime data rows. In some cases, there may be no datetime data row (e.g @Equipment00). The number of datetime entries recorded may vary (e.g @Equipment02 has more datetime entries than @Equipment01).

I will like to create multiple dataframes, based on the equipment and personal gears. The expected results based on the above example will be 4 dataframes (@Equipment01, @Equipment02, @PersonalGear01, @Equipment00).

Is there a pandas way of doing this?

Corralien · Accepted Answer · 2022-02-28 10:43:47Z

2

You can use:

dfs = {}
with open('data.dat') as fp:
    next(fp)  # skip first line
    data = []
    name = next(fp)[1:].strip()
    for row in fp:
        # Parse column names
        if row.startswith('@'):
            headers = row[1:].strip().split(';')
        # Accumulate data
        else:
            while not row.startswith('@'):
                data.append(row.strip().split(';'))
                row = next(fp)
            dfs[name] = pd.DataFrame(data, columns=headers)
            data = []
            name = row[1:].strip()
    dfs[name] = pd.DataFrame(data, columns=headers)

Output:

>>> dfs
{'Equipment01':            Datetime   A    B   C   D
 0  21/02/2005 17:55  23  451  42  31
 1  21/02/2005 17:50  24  143  24  54
 2  21/02/2005 17:45  25  513  31  31,
 'Equipment02':            Datetime   A   B   C   D
 0  21/02/2005 17:55  43   1  42  58
 1  21/02/2005 17:50  14   3  65  51
 2  21/02/2005 17:45   3   3  91  53
 3  21/02/2005 17:40  31  35  13  31
 4  21/02/2005 17:35  34  54  61   5,
 'PersonalGear01':            Datetime   A   B   C   D  E  F
 0  21/02/2005 17:55  41  23   2  16  0  6
 1  21/02/2005 17:50   3  95  51  14  0  6
 2  21/02/2005 17:45   3   2  91  53  0  6,
 'Equipment00': Empty DataFrame
 Columns: [Datetime, A, B, C, D]
 Index: []}

>>> dfs.keys()
dict_keys(['Equipment01', 'Equipment02', 'PersonalGear01', 'Equipment00'])

>>> dfs['Equipment02']
           Datetime   A   B   C   D
0  21/02/2005 17:55  43   1  42  58
1  21/02/2005 17:50  14   3  65  51
2  21/02/2005 17:45   3   3  91  53
3  21/02/2005 17:40  31  35  13  31
4  21/02/2005 17:35  34  54  61   5

answered Feb 28, 2022 at 10:43

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3118602 Over a year ago

Thanks for your reply Corralien. I seem to be getting an error when using it in my original CSV - ValueError: 32 columns passed, passed data had 33 columns. Appeared to happen on line 15 of the code.

Corralien Over a year ago

Can you share your real data?

user3118602 Over a year ago

It appeared that each datetime row has a semicolon at the end. This caused the ValueError. I have modified my post sample code.

Collectives™ on Stack Overflow

Creating multiple dataframes from a single CSV

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related