1

I have the following output file (output.txt) for one of my HPC applications, in the following format

Data(D) Number_of_Processors(P) Process_per_node(ppn) mode time

2048 4 1 0 0.001220
2048 4 1 1 0.000858
32768 4 1 0 0.008137
32768 4 1 1 0.032052
262144 4 1 0 0.078899
262144 4 1 1 0.103439
2048 4 8 0 0.118370
2048 4 8 1 0.064003
32768 4 8 0 0.197745
32768 4 8 1 0.116132
262144 4 8 0 0.502012
262144 4 8 1 0.717104

I have only provided 12 lines here but they are 240 in number, as u can see there are only 3 types of DATA size. I am thorough in C++ but I don't know python. For my employee, I have to make a bar plot of them, like this-->

enter image description here

The Y-axis is the time, and the X-axis is the number of processes that is, P*ppn, and there are 3 plots for 3 data sizes separately, one for 2048, another for 32768, and the last one for 262144.

I have been provided with an example code that I need to modify and plot a graph that looks like the above one.

Here is the example code for graph-->

#!/usr/bin/env python
# coding: utf-8

# In[47]:


import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

sns.set()


# In[48]:


demo_input_format = pd.DataFrame.from_dict({
    "D": [],
    "P": [],
    "ppn": [],
    "mode": [],  # 1 --> optimized, 0 --> standard
    "time": [],
})


# In[49]:


for execution in range(10):
    for P in [4, 16]:
        for ppn in [1, 8]:
            for D in [16, 256, 2048]:
                # Change with the actual data
                demo_input_format = demo_input_format.append({
                    "D": D, "P": P, "ppn": ppn, "mode": 1, "time": np.random.rand() / 10
                }, ignore_index=True)
                demo_input_format = demo_input_format.append({
                    "D": D, "P": P, "ppn": ppn, "mode": 0, "time": np.random.rand()
                }, ignore_index=True)

demo_input_format["(P, ppn)"] = list(map(lambda x, y: ("(" + x + ", " + y + ")"), map(str, demo_input_format["P"]), map(str, demo_input_format["ppn"])))

print(demo_input_format)

# In[50]:


sns.catplot(x="(P, ppn)", y="time", data=demo_input_format, kind="box", col="D", hue="mode")
plt.show()

# In[ ]:

How can I modify this to take input from output.txt file and plot a bar plot like the above? Thank you. Please do help. :)

0

1 Answer 1

1

First create the columns P * ppn and (P, ppn):

df['P * ppn'] = df.P * df.ppn
df['(P, ppn)'] = df.apply(lambda row: f'({row.P:d}, {row.ppn:d})', axis=1)

#          D  P  ppn  mode      time  P * ppn (P, ppn)
# 0     2048  4    1     0  0.001220        4   (4, 1)
# 1     2048  4    1     1  0.000858        4   (4, 1)
# 2    32768  4    1     0  0.008137        4   (4, 1)
# ...

Then melt() the dataframe into "long" form:

melted = df.melt(id_vars=['time', 'D', '(P, ppn)', 'mode'], value_vars='P * ppn')

#         time       D (P, ppn)  mode variable  value
# 0   0.001220    2048   (4, 1)     0  P * ppn      4
# 1   0.000858    2048   (4, 1)     1  P * ppn      4
# 2   0.008137   32768   (4, 1)     0  P * ppn      4
# ...

And finally catplot() the melted dataframe with the data size D along the grid columns:

sns.catplot(
    x='(P, ppn)',
    y='time',
    col='D',
    hue='mode',
    data=melted,
    kind='bar',
)

processes catplot

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.