0

I plotted min points for df['Data'].

Timestamp = pd.date_range('2020-02-06 08:23:04', periods=1000, freq='s')
df = pd.DataFrame({'Timestamp': Timestamp,
                   'Data': 30+15*np.cos(np.linspace(0,10,Timestamp.size))})

df['timediff'] = (df['Timestamp'].shift(-1) - df['Timestamp']).dt.total_seconds()   
df['datadiff'] = df['Data'].shift(-1) - df['Data']
df['gradient'] = df['datadiff'] / df['timediff']

min_pt = np.min(df['Data'])       
# filter_pt = df.loc(df['gradient'] >= -0.1) # & df.loc[i, 'gradient'] <=0.1

mask = np.array(df['Data']) == min_pt 
color = np.where(mask, 'blue', 'yellow')

fig,ax = plt.subplots(figsize=(20,10))
# plt.plot_date(df['Timestamp'], df['Data'], '-' )
ax.scatter(df['Timestamp'], df['Data'], color=color, s=10)
plt.ticklabel_format
plt.show()

The plot looks like this: enter image description here

I want to extend the condition using df['gradient'] column:

  1. What if instead of marking only 'minimum' points, I want to mark the points where gradient lies between 0.1 and -0.1 inclusive?
  2. Additional condition: Take only the first datapoint in such range(ie.0.1 and -0.1 inclusive).
  3. How to loop through whole dataset, rather than just taking the first data point that satisfies these conditions(what my current plot did)?

Tried to add:


df1 = df[df.gradient <= 0.1 & df.gradient >= -0.1]
plt.plot(df1.Timestamp,df1.Data, label="filter")

before mask based on this answer which returned error:

TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

I think what I did wasn't very efficient. How to do it more efficiently?


Update:

With code

Timestamp = pd.date_range('2020-02-06 08:23:04', periods=1000, freq='s')
df = pd.DataFrame({'Timestamp': Timestamp,
                   'Data': 30+15*np.cos(np.linspace(0,10,Timestamp.size))})

df['timediff'] = (df['Timestamp'].shift(-1) - df['Timestamp']).dt.total_seconds()    
df['datadiff'] = df['Data'].shift(-1) - df['Data']
df['gradient'] = df['datadiff'] / df['timediff']

fig,ax = plt.subplots(figsize=(20,10))
df1 = df[(df.gradient <= 0.1) & (df.gradient >= -0.1)]
plt.plot(df1.Timestamp,df1.Data, label="filter")
plt.show()

it returned enter image description here

After changing the range to

df1 = df[(df.gradient <= 0.01) & (df.gradient >= -0.01)]

it returned enter image description here

Why?

1 Answer 1

1

Add the parenthesis on each condition that way you can do logical and row by row

df1 = df[(df.gradient <= 0.1) & (df.gradient >= -0.1)]

And consider using some scatter, otherwise, the latest points where the absolute value of gradient is greater than 0.1 will be connected.

plt.scatter(df1.Timestamp,df1.Data, label="filter")

This would be the final image:

enter image description here

EDIT

If you need only the first point where gradient is in the range, create groups and then use groupby

df['groups'] = ((df.gradient > 0.1) | (df.gradient < -0.1)).cumsum()

df2 = df[(df.gradient <= 0.1) & (df.gradient >= -0.1)]
    .groupby('groups').agg({'Timestamp':'first', 'Data':'first'})

#        Timestamp              Data
# groups        
# 0      2020-02-06 08:23:04    45.000000
# 168    2020-02-06 08:27:05    18.814188
# 336    2020-02-06 08:32:19    41.201294
# 504    2020-02-06 08:37:33    18.783251
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks jcaliz! Is there a way to mark only the first data point in each range?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.