2

What I want to achieve is a more elegant and direct method of annotating points with x and y position from a pandas dataframe with a corresponding label from the same row.

This working example works and results in what I want, but I feel there must be a more elegant solution out there without having to store individual columns in separate lists first and having to iterate over them.

My concern is that having these separate lists could results in misalignment of labels with data in cases of larger and complicated datasets with missing values, nans, etc.

In this example, x = Temperature, y = Sales and the label is the Date.

import pandas as pd
import matplotlib.pyplot as plt

d = {'Date': ['15-08-24', '16-08-24', '17-08-24'], 'Temperature': [24, 26, 20], 'Sales': [100, 150, 90]}
df = pd.DataFrame(data=d)

Which gives:

       Date  Temperature  Sales
0  15-08-24           24    100
1  16-08-24           26    150
2  17-08-24           20     90

Then:

temperature_list = df['Temperature'].tolist()
sales_list = df['Sales'].tolist()
labels_list = df['Date'].tolist()

fig, axs = plt.subplots()
axs.scatter(data=df, x='Temperature', y='Sales')
for i, label in enumerate(labels_list):
    axs.annotate(label, (temperature_list[i], sales_list[i]))
plt.show()

What I aim for - but does not work - is something along the lines of:

fig, axs = plt.subplots()
axs.scatter(data=df, x='Temperature', y='Sales')
axs.annotate(data=df, x='Temperature', y='Sales', text='Date') # this is invalid
plt.show()

Suggestions welcome. If there is no way around the iterative process, perhaps there is at least a fail-safe method to warrant correct attribution of labels to corresponding data points.

2 Answers 2

1

You probably can't avoid the iteration, but you can remove the need to create lists by using df.iterrows(). This has the added benefit that you are not decoupling any data from your DataFrame.

import pandas as pd
import matplotlib.pyplot as plt

d = {'Date': ['15-08-24', '16-08-24', '17-08-24'], 'Temperature': [24, 26, 20], 'Sales': [100, 150, 90]}
df = pd.DataFrame(data=d)

fig, axs = plt.subplots()
axs.scatter(data=df, x='Temperature', y='Sales')

for i, row in df.iterrows():
    axs.annotate(row["Date"], (row["Temperature"], row["Sales"]))
    
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Great suggestion. What you mention as 'added benefit' is exactly what I was looking for in my last line of the original question (correct attribution of corresponding labels), but you worded it better. Thank you!
1

I'm not aware of a simple say to achieve this rather than looping, which I agree is painful. If it is possible to use plotly express instead of matplotlib/seaborn, this becomes much easier:

import pandas as pd
import plotly.express as px

d = {'Date': ['15-08-24', '16-08-24', '17-08-24'], 'Temperature': [24, 26, 20], 'Sales': [100, 150, 90]}
df = pd.DataFrame(data=d)
fig = px.scatter(df, x='Temperature',y='Sales',text='Date')
fig.update_traces(textposition='top center')

enter image description here

1 Comment

Thanks, great to know that this package exists an does exactly what I asked for in one line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.