0

I am new to python data analysis. Following is an example dataset:

d2 = {'Index': [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1], 'journey_time':[95.546,132.945,147.538,301.307,42.907,129.008,102.900,112.620,234.334,103.321,82.337,154.817,20.076,85.717,94.362,45.032],'edge':['s_b','c_d','b_d','c_e','d_f','s_a','a_c','d_c','c_e','a_c','d_c','s_a','d_f','s_b','b_d','c_d']}
df2=pd.DataFrame(data=d2)

I want to create a new data frame where there is one row for each index with new columns. The rules for the new columns are as such:

se1 = s_a + a_c + c_e
se2 = s_b + b_d + d_c + c_e
sf1 = s_b + b_d + d_f
sf2 = s_a + a_c + c_d + d_f 

Also, I have further variations in my calculations such as

eq_time1 = (200/(s_a + a_c)) + c_e
eq_time2 = (200/(s_b + b_d + d_c)) + c_e 

The values of the edges in the rules are the corresponding journey time for each unique index. I am not sure how to write this in python dataframe. Following is my expected output:

df3 = {'Index':[0,1],'se1':[129.008+102.900+301.307,154.817+103.321+234.334],'se2':[95.546+147.538+112.620+301.307,85.717+94.362+82.337+234.334],'sf1':[95.546+147.538+42.907,85.717+94.362+20.076],'sf2':[129.008+102.900+132.945+42.907,154.817+103.321+45.032+20.076 ],'eq_time1':[(200/(129.008+102.900))+301.307,(200/(154.817+103.321))+234.334   ], 'eq_time2' : [(200/(95.546+147.538+112.620))+301.307,(200/(85.717+94.362+82.337))+234.334]}

Please help!

1 Answer 1

1

If you have just those 4 paths in your data, you can calculate the times in pandas as follows:

paths = {
  'se1': ['s_a', 'a_c', 'c_e'],
  'se2': ['s_b', 'b_d', 'd_c', 'c_e'],
  'sf1': ['s_b', 'b_d', 'd_f'],
  'sf2': ['s_a', 'a_c', 'c_d', 'd_f']
}

paths = {
  'se1': ['s_a', 'a_c', 'c_e'],
  'se2': ['s_b', 'b_d', 'd_c', 'c_e'],
  'sf1': ['s_b', 'b_d', 'd_f'],
  'sf2': ['s_a', 'a_c', 'c_d', 'd_f']
}

df3 = pd.DataFrame({'Index': df2['Index'].unique()}).set_index('Index')

for k, v in paths.items():
  df3[k] = df2[df2.edge.isin(v)].groupby('Index')['journey_time'].sum()
  last_edge_times = df2[df2.edge==v[-1]].set_index('Index')
  df3['eq_time_'+k] = 200.0/(df3[k] - last_edge_times.journey_time) + last_edge_times.journey_time

For any path p, eq_time_p column stores the eq_times as given by your equations.

Sign up to request clarification or add additional context in comments.

3 Comments

This technique is useful. However, I have further variations in my calculations which would not be this straight forward that I have just added by editing the question. Please check and suggest!
This was really useful. Could you suggest me some resources to learn more to do such operations?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.