Scatter plots matplotlib: Dealing with pandas datetime as index

Question

For some background info, I would like to create a scatter plot of different dataframes (each dataframe as been read from a csv) where the x value is the date and the y value is the water 'level'.

I've been trying to work out how I can make a scatter plot were the x value is the date or the index. After trying a number of options, I feel as though this is the 'best' error I have got so far:

    KeyError: "None of [DatetimeIndex(['2017-11-04 00:00:00',    
    '2017-11-04 01:00:00',\n ... '2018-02-26 11:00:00', '2018-02-26 
    12:00:00'],\n dtype='datetime64[ns]', name='date', length=2749, 
    freq=None)] are in the [columns]" .

I'm importing in my data from a csv file that looks something like this:

    date,               level
    2017-10-26 14:00:00, 700.1
    2017-10-26 15:00:00, 500.5
    2017-10-26 16:00:00, NaN
               ...

And I'm reading in the file like so:

df = pd.read_csv("data.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-11-04 00:00:00':]

Then this is my attempt at trying to plot the scatter plot:

ax = df.plot()
ax1 = df.plot(kind='scatter', x=df.index, y='level', color='r')

# ... my other dataframes I'd like to plot on the same graph...

I've only started using pandas so apologies for my lack of understanding. I've been fiddling about with what different ways of importing the csv ( the sep='\s*,\s*' was one attempt) but to no avail. I'd greatly appreciate any advice, thank you.

Edit: More thorough code

data1.csv:

date,level
2017-10-26 14:00:00,500.1
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,700.0
2017-10-26 22:00:00,700.0

data2.csv:

date,level
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,900.0
2017-10-26 22:00:00,900.0
2017-10-26 23:00:00,NaN

code:

import pandas as pd
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']

ax1 = df.plot(kind='scatter', x='date', y='level', color='r')
ax2 = df2.plot(kind='scatter', x='date', y='level', color='g',      ax=ax1)

plt.show()

x should be a column name of your dataframe. Does x="date" not work? Or removing that argument completely? — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Feb 18, 2019 at 22:10
That was originally what I was trying to do but unfortunately doesn't work , so where x='date' I get KeyError: 'date'. And when I remove the 'x' argument I get a complaint saying that it has to be there. Also when I enter df.columns I only get 'level' back, which could be due to the fact I made the date my index, maybe? — chromestone
– chromestone, Commented Feb 18, 2019 at 22:23
Oh yes. Keep "date" as column to be able to use it inside the x argument. — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Feb 18, 2019 at 22:29
So this may turn into a different question, but if I remove df.set_index('date', inplace=True) I get this error: ValueError: view limit minimum -36837.575000000004 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units Do you think you'd be able to help me by shedding some light on this? As in I'm not sure how I can do both at the same time. This error shows up at the ax = df1.plot() line — chromestone
– chromestone, Commented Feb 18, 2019 at 22:36
Note that it's really cumbersome without minimal reproducible example. So I can only comment on individual steps instead of just providing a working answer. It could well be that pandas is not able to plot scatter plots with dates. What you can always do is plt.scatter(df["date"].values, df['level'].values) instead. — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Feb 18, 2019 at 22:43

chromestone · Accepted Answer · 2019-02-21 17:34:14Z

1

In case anyone runs into the same problem, I found a work around as described here: pandas scatter plotting datetime

I just added style='o' as seen below:

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]
ax = df.plot(style='o')

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']
df2.plot(ax=ax,style='o')

plt.show()

answered Feb 21, 2019 at 17:34

chromestone

211 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scatter plots matplotlib: Dealing with pandas datetime as index

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related