0

Here is the csv that i have as an input:

enter image description here

I would like to create 3 dataframes for each variable: BEL AIR - feeder_30 - MANGUIER, BEL_TC_30_TR3_MW and BEL AIR - feeder_30 - MTOA. It will have Date value as an index and Valeur as values.

I tried this :

data_adms = pd.read_csv(full_path,sep=';',index_col=False, header=6)

But it gets rid of the name of nom du point distant. And the timestamp is not the same between rows.

Does somebody have an idea how can I properly proceed?

2
  • Hi, is it safe to assume that the number of columns between each dataset (from 'source' to the next 'one' is the same? Also, the header you want to retain is on the sixth line, right? Commented Oct 19, 2021 at 4:52
  • Thanks for the edit ! Yes the number of columns between each dataset is the same. On the 7th line yes. But I will need to pass the name of each dataset (in cell B2, F2, ....) Commented Oct 20, 2021 at 9:43

1 Answer 1

1

So, given the following csv file:

                     0                               1   ...          10           11
0                Source                PUISSANCE ACTIVE  ...         NaN          NaN
1  Nom du point distant  BEL AIR - feeder_30 - MANGUIER  ...         NaN          NaN
2          Propriétaire                Puissance active  ...         NaN          NaN
3                 Unité                              Kw  ...         NaN          NaN
4                 Phase                        L1 L2 L3  ...         NaN          NaN
5       Echantillonnage      Moyenne (Interval : 01:00)  ...         NaN          NaN
6                  Date                          Valeur  ...     Qualité  Indicateurs
7      01/09/2020 13:53                      5189,60325  ...  Discutable          NaN
8      02/09/2020 13:54                      5043,68066  ...  Discutable          NaN
9      03/09/2020 13:55                      4805,71191  ...  Discutable          NaN

You could get the names and col numbers like this:

raw_data = pd.read_csv(
    filepath_or_buffer="./file.csv", sep=";", header=None, engine="python"
)
data = {raw_data.loc[1, x]: x for x in range(1, raw_data.shape[1], 4)}
print(data)
# Outputs
{'BEL AIR - feeder_30 - MANGUIER': 1, 'BEL_TC_30_TR3_MW': 5, 'BEL AIR - feeder_30 - MTOA': 9}

Then, you could import dataframes and pair them with their names like this:

dfs = []
for name, col in data.items():
    df = pd.read_csv(
        filepath_or_buffer="./scripts/test.csv",
        sep=";",
        header=0,
        skiprows=6,
        usecols=[col - 1, col],
        engine="python",
    )
    df.columns = ["Date", "Valeur"]
    df.set_index("Date", inplace=True)
    df.columns.name = name
    dfs.append(df)

In which case:

for df in dfs:
    print(df)

# Outputs

BEL AIR - feeder_30 - MANGUIER      Valeur
Date
01/09/2020 13:53                5189,60325
02/09/2020 13:54                5043,68066
03/09/2020 13:55                4805,71191

BEL_TC_30_TR3_MW        Valeur
Date
01/09/2020 13:53   -47,3029671
02/09/2020 13:54  -5,829510403
03/09/2020 13:55    1,52590215

BEL AIR - feeder_30 - MTOA      Valeur    
Date                                      
01/09/2020 13:53            5189,60325    
02/09/2020 13:54            5043,68066    
03/09/2020 13:55            4805,71191 
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry for the late reply, I just came back on the projet. It's exactly what I was looking for and works perfectly. Thank you very much !!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.