2

I'm having an issue with Matplotlib v 3.1.3 from conda-forge with python 3.7. I have all of the dependencies required for Matplotlib. When I enter this code, which should work. I get splatter art. It's based on this youtube tutorial: https://www.youtube.com/watch?v=LWjaAiKaf8&list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB&index=8

import matplotlib.pyplot as plt
import pandas as pd

df_train = pd.read_csv('mydata.csv', date_parser=True)
df_train.columns = ['date', 'col1', 'col2', 'col3', 'col4', 'col5']
df_train['date'] = pd.to_datetime(df_train['date'])
df_train.set_index(['date'])

x_value = df_train['date']
y_value = df_train['col4']
plt.plot_date(x_value, y_value )
plt.gcf().autofmt_xdate()
plt.show

The rendering of the matplotlib chart based on this code looks like this: enter image description here

I tried another approach using the matplotlib DateFormatter and Locator. I got something resembling a line chart underneath a child's scribbling. But it had dates:

df_train = pd.read_csv('mydata.csv', date_parser=True)
df_train.columns = ['date', 'col1', 'col2', 'col3', 'col4', 'col5']
df_train['date'] = pd.to_datetime(df_train['date'])
df_train.set_index(['date'])

    # Visualize data
x_values = df_train['date']
y_values = df_train['col4']
ax = plt.gca()
plt.figure(figsize=(16, 8))
formatter = mpl_dates.DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_formatter(formatter)
locator = mpl_dates.DayLocator()
ax.xaxis.set_major_locator(locator)
plt.plot(x_values, y_values)
plt.show()

enter image description here

Finally, if I change the code to exclude the dates: I get a perfectly rendered chart with no dates:

import matplotlib.pyplot as plt
import pandas as pd

df_train = pd.read_csv('mydata.csv', date_parser=True)
df_train.columns = ['date', 'col1', 'col2', 'col3', 'col4', 'col5']
df_train['date'] = pd.to_datetime(df_train['date'])
df_train.set_index(['date'])

x_value = df_train['date']
y_value = df_train['col4']
plt.plot(df_train['col4']
plt.gcf().autofmt_xdate()
plt.show()

I've tried closing the plots at the end to no avail. I checked the Matplotlib docs and followed it to a 'T' including using the wheel build and creating the conda channel and installing the dependencies and setting the path and includes per the documentation. I'm at my wits end. Can someone more educated on the subject give me a hand? Thanks in advance.

enter image description here

1 Answer 1

1

It seems that the default setting in plot_date() is set to scatterplots (see (https://www.geeksforgeeks.org/matplotlib-pyplot-plot_date-in-python/) in the newer versions of matplotlib.

To achieve a continuous graph based on dates, you can define the interlining in the arguments plt.plot_date(x_value, y_value, '-').

This code works for me:

import matplotlib.pyplot as plt
import pandas as pd

df_train = pd.read_csv('test.csv', date_parser=True)
df_train.columns = ['date', 'col1', 'col2', 'col3', 'col4', 'col5', 'col6']
df_train['date'] = pd.to_datetime(df_train['date'])
df_train.set_index(['date'])

x_value = df_train['date']
y_value = df_train['col4']
plt.plot_date(x_value, y_value, '-')
plt.gcf().autofmt_xdate()
plt.show()

Output:

Graph

This functionality of not using the lineplot by default is indeed questionable given that the plot also automatically changes from scatterplot to lineplot when you just change the color: plt.plot_date(x_value, y_value, 'g').

This might just be a bug in the current versions of mpl.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Chuck. I added the '-' to the plot_date() function, now I'm getting, "ValueError: Length mismatch: Expected axis has 6 elements, new values have 7 elements"
I tested the code above with the data from here. The dataframe had 7 columns, which I used in line nr. 5. You might need to adapt this line depending on the dataset you are using.
It works perfectly on your test data, but something must be wrong with my csv because I'm still getting child scribbling over chart as in chart #2 above.
Try to sort by the date before plotting. If it doesn't help, look for outliers /wrong formatted stuff in the date column.
I had too many columns when I said, df_train.columns = [...] My data is working now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.