0

I am trying to create a new concatenated variable for each row in my merged dataframe. The concatenated names will be based on the table names with respect to the table index and parent index. The parent index acts as a map for the table index. Here's what my tables look like:

Table Index  ParentIndex  TableName
    0           -1        ingredient
    1            0        salt
    2            0        pepper
    3            1        butter
df0

FieldIndex    TableIndex    FieldName
    0              1         afield
    1              3         anotherfield
    2              2         afield
df1

I have merged the dataframes on TableIndex. The desired output would be something like this:

TableIndex   ParentIndex    FieldIndex    FieldName     ConcatNames
    1             0             0         afield        ingredient.salt.afield
    3             1             1         anotherfield  ingredient.salt.butter.anotherfield
    2             0             2         afield        ingredient.pepper.afield

As you can see, the ParentIndex is sort of a composite function for TableIndex until it reaches -1 (and does not have to be included in the final output). I am not sure how to go about this. Could this be achieved using something like df.index.map or pd.IntervalIndex? This is also not the only file, and table names vary for each.

1
  • This is more like a network problem. You may wan to have a look at networkx. Commented Jun 8, 2020 at 19:31

2 Answers 2

1
df = pd.merge(df1, df0,on='TableIndex')
for index, row in df.iterrows():
    pidx = row.ParentIndex
    table_names = [row.TableName,row.FieldName]
    while pidx!=-1:
        p_row = df0[df0['TableIndex']==pidx]
        insert_name = p_row.TableName.iloc[0]
        table_names.insert(0, insert_name)
        pidx = p_row.ParentIndex.iloc[0]
    df.at[index, "ConcatName"] = ".".join(table_names)
print(df[['TableIndex','ParentIndex','FieldIndex','FieldName','ConcatName']])
Sign up to request clarification or add additional context in comments.

1 Comment

Please, provide some comments or/and description to your answer, not just plain code.
0

I tried to solved it like this... hope it will help you.

df = pd.merge(df0, df1)
table_name = df0[df0["ParentIndex"] == -1]["TableName"][0]
for index, row in df.iterrows():
    table_names = df0[df0["ParentIndex"] == row["ParentIndex"]]["TableName"].to_list()
    str_table_names = ".".join(table_names)
    df.at[index, "ConcatName"] = table_name + "." + str_table_names + "." + row["FieldName"]

1 Comment

That's close, the only issue is it joined all table names with the same parent index. So I got ingredient.salt.pepper.afield for two different variables. Thanks for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.