1

For some background info, I would like to create a scatter plot of different dataframes (each dataframe as been read from a csv) where the x value is the date and the y value is the water 'level'.

I've been trying to work out how I can make a scatter plot were the x value is the date or the index. After trying a number of options, I feel as though this is the 'best' error I have got so far:

    KeyError: "None of [DatetimeIndex(['2017-11-04 00:00:00',    
    '2017-11-04 01:00:00',\n ... '2018-02-26 11:00:00', '2018-02-26 
    12:00:00'],\n dtype='datetime64[ns]', name='date', length=2749, 
    freq=None)] are in the [columns]" .   

I'm importing in my data from a csv file that looks something like this:

    date,               level
    2017-10-26 14:00:00, 700.1
    2017-10-26 15:00:00, 500.5
    2017-10-26 16:00:00, NaN
               ...

And I'm reading in the file like so:

df = pd.read_csv("data.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-11-04 00:00:00':]

Then this is my attempt at trying to plot the scatter plot:

ax = df.plot()
ax1 = df.plot(kind='scatter', x=df.index, y='level', color='r')

# ... my other dataframes I'd like to plot on the same graph...

I've only started using pandas so apologies for my lack of understanding. I've been fiddling about with what different ways of importing the csv ( the sep='\s*,\s*' was one attempt) but to no avail. I'd greatly appreciate any advice, thank you.

Edit: More thorough code

data1.csv:

date,level
2017-10-26 14:00:00,500.1
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,700.0
2017-10-26 22:00:00,700.0

data2.csv:

date,level
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,900.0
2017-10-26 22:00:00,900.0
2017-10-26 23:00:00,NaN

code:

import pandas as pd
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']

ax1 = df.plot(kind='scatter', x='date', y='level', color='r')
ax2 = df2.plot(kind='scatter', x='date', y='level', color='g',      ax=ax1)

plt.show()
7
  • x should be a column name of your dataframe. Does x="date" not work? Or removing that argument completely? Commented Feb 18, 2019 at 22:10
  • That was originally what I was trying to do but unfortunately doesn't work , so where x='date' I get KeyError: 'date'. And when I remove the 'x' argument I get a complaint saying that it has to be there. Also when I enter df.columns I only get 'level' back, which could be due to the fact I made the date my index, maybe? Commented Feb 18, 2019 at 22:23
  • Oh yes. Keep "date" as column to be able to use it inside the x argument. Commented Feb 18, 2019 at 22:29
  • So this may turn into a different question, but if I remove df.set_index('date', inplace=True) I get this error: ValueError: view limit minimum -36837.575000000004 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units Do you think you'd be able to help me by shedding some light on this? As in I'm not sure how I can do both at the same time. This error shows up at the ax = df1.plot() line Commented Feb 18, 2019 at 22:36
  • Note that it's really cumbersome without minimal reproducible example. So I can only comment on individual steps instead of just providing a working answer. It could well be that pandas is not able to plot scatter plots with dates. What you can always do is plt.scatter(df["date"].values, df['level'].values) instead. Commented Feb 18, 2019 at 22:43

1 Answer 1

1

In case anyone runs into the same problem, I found a work around as described here: pandas scatter plotting datetime

I just added style='o' as seen below:

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]
ax = df.plot(style='o')

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']
df2.plot(ax=ax,style='o')

plt.show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.