3

I have a dataset from an experiment that records 10 readings per second, i.e. 600 readings per minute. The data is for 1 month but some dates are missing, I am assuming the reading was turned off on those days. When I plot a graph of this reading vs Time using matplotlib, a line is drawn connecting the last available date and next available date.

enter image description here

However, instead of this line, I want a gap to be shown so that it is clear to the viewer that data is unavailable for those days.

I am using Matplotlib and Python 3 to plot.

Here is how the data looks like

timestamp,x
2019-09-03 18:33:38,17.546
2019-09-03 18:33:38,17.546
2019-09-03 18:33:39,17.546
2019-09-03 18:33:39,17.555999999999997
2019-09-03 18:33:39,17.589000000000002
2019-09-03 18:33:39,17.589000000000002
2019-09-03 18:33:39,17.589000000000002
2019-09-03 18:33:39,17.589000000000002
2019-09-03 18:33:39,17.593
2019-09-03 18:33:39,17.595
2019-09-03 18:33:40,17.594
2
  • Have you tried to filter out null timestamp values? Commented Nov 27, 2019 at 14:53
  • Hi @FBruzzesi the dataset does not contain those timestamp values at all. For example, a row i contains timestamp with date 2019-09-12 and row i+1 directly has timestamp with date 2019-09-18 Commented Nov 27, 2019 at 14:55

2 Answers 2

1

Another option based on @Diziet Asahi's answer is to plot as a scatter rather than a line. This should work with multiple data points for a single x-value. Since you data is very heavily sampled it may have a similar visual effect to a line anyway.

import matplotlib.pylab as plt
%matplotlib inline
import pandas as pd

# first bit of code copied from @Diziet's answer
df = pd.concat([pd.DataFrame([10]*600, index=pd.date_range(start='2018-01-01 00:00:00', periods=600, freq='0.1S')),
                pd.DataFrame([20]*600, index=pd.date_range(start='2018-01-01 00:01:10', periods=600, freq='0.1S'))])

df2 = df.resample('0.1S').asfreq()

# plot three times, twice using different settings using the original data, 
# once using resampled data
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(8,4))
df.plot(ax=ax1)
df.plot(ax=ax2, marker='.', linestyle='')
df2.plot(ax=ax3)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

I used a square marker and adjusted its size a bit ax1.plot(x, y1, color=color1,label=y1_label,marker='s', linestyle='',markersize=1.5) and it looks similar to the line chart
1

I think in this case, you shoud add the missing data to your dataframe by resampling your dataframe at 10Hz

df = pd.concat([pd.DataFrame([10]*600, index=pd.date_range(start='2018-01-01 00:00:00', periods=600, freq='0.1S')),
                pd.DataFrame([20]*600, index=pd.date_range(start='2018-01-01 00:01:10', periods=600, freq='0.1S'))])

df2 = df.resample('0.1S').asfreq()

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,4))
df.plot(ax=ax1)
df2.plot(ax=ax2)

enter image description here

1 Comment

I reckon the resample() method requires timestamp to be a DateTimeIndex. In this case since we do not have millisecond value (".f") in the timestamp, there are atleast 10 timestamps with same value '%Y-%m-%d_%H-%M-%S'. Setting this column as index will result in an error of duplicate value axis.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.