0

I have the following dataset Excel Dummy DataSet that consists of a concatenation of several Tables in an Excel sheet. They are all stacked vertically. The columns of the different tables are same, col_x, col_y, col_t, except for the column Y that varies as the tables change (See the figure below).

I somehow manage to get the output. However, I wonder if there is a simpler|more efficient way to do this?

enter image description here

This is what I have tried

import pandas as pd

# Import Data
path = r"/content/test_data.xlsx"
df_original = pd.read_excel(path, skiprows=4, usecols= range(0,4), header=None)
df_original.columns=["col_x","col_y","col_z","col_t"]

# Begining of the code
mask_col_x = df_original["col_x"] == "col_x"
df_break = df_original[mask_col_x]
index_break_list = df_break.index

range_list = []

for i, val in enumerate(index_break_list):

    if i < len(index_break_list)-1:

        span1 = (val+1,index_break_list[i+1],df_original["col_y"][val])

        range_list.append(span1)

    span1 = (val+1,len(df_original),df_original["col_y"][val])

range_list.append(span1)

dataframe_list = []

for elt in range_list:

    df_sub = df_original.iloc[elt[0]:elt[1]].copy()

    df_sub["Value y"] = elt[2]

    dataframe_list.append(df_sub)
 

new_df = pd.concat(dataframe_list,axis=0)

new_df.to_csv("test_data_result_combined.csv")
1
  • Can U share d file? Commented Jan 11, 2023 at 8:16

1 Answer 1

2

You can create column Value y by mask with Series.where and then forward filling missing values by ffill and last filter out rows by invert mask by ~:

path = "test_data.xlsx"
df_original = pd.read_excel(path, skiprows=4, usecols= range(0,4), header=None)
df_original.columns=["col_x","col_y","col_z","col_t"]

mask_col_x = df_original["col_x"] == "col_x"
df_original['Value y'] = df_original["col_y"].where(mask_col_x).ffill()

new_df = df_original[~mask_col_x]
print (new_df)
      col_x          col_y          col_z          col_t Value y
1    index1  val_y1_table1  val_z1_table1  val_t1_table1      y1
2    index2  val_y2_table1  val_z2_table1  val_t2_table1      y1
3    index3  val_y3_table1  val_z3_table1  val_t3_table1      y1
4    index4  val_y4_table1  val_z4_table1  val_t4_table1      y1
6    index5  val_y1_table2  val_z1_table2  val_t1_table2      y2
7    index6  val_y2_table2  val_z2_table2  val_t2_table2      y2
8    index7  val_y3_table2  val_z3_table2  val_t3_table2      y2
10   index8  val_y1_table3  val_z1_table3  val_t1_table3      y3
11   index9  val_y2_table3  val_z2_table3  val_t2_table3      y3
13  index10  val_y1_table4  val_z1_table4  val_t1_table4      y4
15  index11  val_y1_table5  val_z1_table5  val_t1_table5      y5
16  index12  val_y2_table5  val_z2_table5  val_t2_table5      y5
17  index13  val_y3_table5  val_z3_table5  val_t3_table5      y5
18  index14  val_y4_table5  val_z4_table5  val_t4_table5      y5
19  index15  val_y5_table5  val_z5_table5  val_t5_table5      y5
20  index16  val_y6_table5  val_z6_table5  val_t6_table5      y5
21  index17  val_y7_table5  val_z7_table5  val_t7_table5      y5
Sign up to request clarification or add additional context in comments.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.