For some background info, I would like to create a scatter plot of different dataframes (each dataframe as been read from a csv) where the x value is the date and the y value is the water 'level'.
I've been trying to work out how I can make a scatter plot were the x value is the date or the index. After trying a number of options, I feel as though this is the 'best' error I have got so far:
KeyError: "None of [DatetimeIndex(['2017-11-04 00:00:00',
'2017-11-04 01:00:00',\n ... '2018-02-26 11:00:00', '2018-02-26
12:00:00'],\n dtype='datetime64[ns]', name='date', length=2749,
freq=None)] are in the [columns]" .
I'm importing in my data from a csv file that looks something like this:
date, level
2017-10-26 14:00:00, 700.1
2017-10-26 15:00:00, 500.5
2017-10-26 16:00:00, NaN
...
And I'm reading in the file like so:
df = pd.read_csv("data.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-11-04 00:00:00':]
Then this is my attempt at trying to plot the scatter plot:
ax = df.plot()
ax1 = df.plot(kind='scatter', x=df.index, y='level', color='r')
# ... my other dataframes I'd like to plot on the same graph...
I've only started using pandas so apologies for my lack of understanding. I've been fiddling about with what different ways of importing the csv ( the sep='\s*,\s*' was one attempt) but to no avail. I'd greatly appreciate any advice, thank you.
Edit: More thorough code
data1.csv:
date,level
2017-10-26 14:00:00,500.1
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,700.0
2017-10-26 22:00:00,700.0
data2.csv:
date,level
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,900.0
2017-10-26 22:00:00,900.0
2017-10-26 23:00:00,NaN
code:
import pandas as pd
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]
df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']
ax1 = df.plot(kind='scatter', x='date', y='level', color='r')
ax2 = df2.plot(kind='scatter', x='date', y='level', color='g', ax=ax1)
plt.show()
xshould be a column name of your dataframe. Doesx="date"not work? Or removing that argument completely?x='date'I getKeyError: 'date'. And when I remove the 'x' argument I get a complaint saying that it has to be there. Also when I enterdf.columnsI only get 'level' back, which could be due to the fact I made the date my index, maybe?xargument.df.set_index('date', inplace=True)I get this error:ValueError: view limit minimum -36837.575000000004 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime unitsDo you think you'd be able to help me by shedding some light on this? As in I'm not sure how I can do both at the same time. This error shows up at theax = df1.plot()lineplt.scatter(df["date"].values, df['level'].values)instead.