0

I have a dataframe like below:

import pandas as pd
import numpy as np
period0 = pd.date_range('1/1/2011', periods=50, freq='D')
period1 = pd.date_range('18/5/2012', periods=50, freq='D')
period2 = pd.date_range('7/11/2014', periods=50, freq='D')
df = pd.concat((pd.DataFrame(period0), pd.DataFrame(period1), pd.DataFrame(period2)), axis=0)

df['y'] = pd.DataFrame(np.random.rand(150,1))

These dates and periods are arbitrarily chosen to create some gaps and dates.

When I try to plot the dataframe, matplotlib automatically draws a line in between the date gaps:

plt.plot(df[0], df['y'])

Result: enter image description here

I also tried to dotplot. But it didn't prevent the plot from creating the line:

plt.plot(df[0], df['y'], ':')

Result: enter image description here

And I also found a relevant question. Unfortunately, it didn't solve my problem.

So, what should I do?

2
  • Have you considered using a scatterplot instead of a line plot? Commented Jan 6, 2019 at 21:20
  • Thanks for the suggestion. I'll try it. Commented Jan 6, 2019 at 21:32

2 Answers 2

1

If you can't modify your existing index, you could try :

df.groupby(pd.Grouper(key=0, freq='1D'))['y'].last().plot()
Sign up to request clarification or add additional context in comments.

1 Comment

Wow! It miraculously did what I wanted. Thanks for the answer man.
1

You should define values you do not want to see as NaN:

https://matplotlib.org/examples/pylab_examples/nan_test.html

For example:

df.index = df[0].astype('datetime64')
#defining df[0] as index

idx = pd.date_range(start = '1/1/2011', end = max(period2), freq='D')
#creating new index

df = df.reindex(idx)
#reindexing df - it preserves values from 'y'

plt.plot(df.index, df['y'])
#creating plot

4 Comments

Actually I'm not able to write a function for it. It's a bit hard for me, since I have multiple dataframes like this. Could you please add some pseudo code?
The code works. Thanks for this valuable code. I really appreciate it. But when I try to run the code multiple times, I get this error: ValueError: cannot reindex from a duplicate axis
@ImportanceOfBeingErnest reindex(idx) adds new indexes and for rows with no data in df['y] procudes NaN
My previous comment was meant to help improving the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.