0

I have a data frame and would like to make a scatter plot of how long it took for a request to be completed days on the y-axis and the day the request was filed (Received, which is a datetime object) on the x-axis.

Someone values of 'Received' have two entries because sometimes two requests were filed on the same day.

Here are some of my data and the code I have tried:

Received          Days
2012-08-01        41.0 
2014-12-31       692.0
2015-02-25       621.0
2015-10-15       111.0

sns.regplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+", fit_reg=False)


plt.plot('Received','days', simple_denied, color='black')
5
  • I think you may wanna use barplot, line plot or heatmap instead of scatterplot since it would require two continues variable. If there's dups in Received, try to aggregate the Days together first before plotting like taking the means or something. Commented Feb 15, 2019 at 3:19
  • jakevdp.github.io/PythonDataScienceHandbook/… Commented Feb 15, 2019 at 3:23
  • youtube.com/watch?v=jV24N7SPXEU Commented Feb 15, 2019 at 3:30
  • I would like to use a scatter plot to avoid having to aggregate the data. The variables have the same x-axis variable but different y-axis variables. Commented Feb 15, 2019 at 3:46
  • I don't want a line graph by grouping. And I think making barplots would compliment the scatter plots by grouping by month but that is a seperate question. Commented Feb 15, 2019 at 3:47

2 Answers 2

0

You hit two cases which don't work. sns.regplot would not work with dates. And plt.plot will need to have the data specified (it cannot know which data to use just by the name of the columns).

So any of the following would provide you a scatter plot of the data

  • sns.scatterplot(x="Received", y="days", data=simple_denied, marker="+")
  • sns.scatterplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+")

  • plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+")

  • plt.plot(simple_denied["Received"].values, simple_denied["days"].values, marker="+", ls="")

  • plt.plot("Received", "days", data=simple_denied, marker="+", ls="")

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, the plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+") works. The sns plots both give me the error AttributeError: module 'seaborn' has no attribute 'scatterplot' even though I pip installed the latest seaborn package. The other two plt.plot() create blank graphs. Please let me know if there is something I am missing.
Concerning seaborn, yes, scatterplot is quite new. So probably your update was unsuccessful. For the plot commands, maybe also your matplotlib version is too old?
Do the numbers on the axes correspond to the range of values you would expect from the data? Can you try to reduce the dataset to see if that makes a difference (e.g. using df.head() instead of df)? Can you try to use a different marker? Also make sure to actually use the versions you think you use by printing within the code print(<package>.__version__) and compare to what you expect.
0

Let's start by setting up your data. I actually added another date '2014-12-31' to your example dataset, so that we can verify that our plotting routine works when we have multiple requests received on the same day:

import matplotlib.pyplot as plt
plt.style.use('seaborn')
import pandas as pd
import numpy as np

dates = np.array(['2012-08-01', '2014-12-31',
                  '2014-12-31', '2015-02-25',
                  '2015-10-15'], dtype='datetime64')

days = np.array([41, 692, 50, 621, 111])

df = pd.DataFrame({'Received' : dates, 'Days' : days})

The dataframe created should hopefully approximate what you have. Producing the scatter plot you desire is now straight forward:

fig, ax = plt.subplots(1, 1)

ax.scatter(df['Received'], df['Days'], marker='+')
ax.set_xlabel("Receieved")
ax.set_ylabel("Days")

This gave me the following plot:

enter image description here As noted by @ImportanceOfBeingErnest in the comments below, you need a recent version of pandas for this routine to work.

11 Comments

Interesting. In which versions of numpy and matplotlib does this work?
I tested with (matplotlib 3.0.2, numpy 0.15.4), (2.2.3, 0.15.2), (2.0.2, 0.14.5) and it fails with an TypeError: invalid type promotion error.
I'm running matplotlib 3.0.2 and numpy 1.16.1. I'm also running pandas 0.24.1. I think this is to do with how pandas converts dates between pandas and matplotlib.
Your numpy versions seem way off. Did you mean you tested on numpy 1.15.4, 1.15.2,... rather than 0.15.4, 0.15.2,... ?
Yes, replace each zero by one. This is great, hopefully 1.16.1 will be stable enough to be added to conda default channel soon.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.