I am trying to identify which time stamps in my index have duplicates. I want to create a list of the time stamp strings. I would like to return a single timestamp for each of the time stamps that have duplicates if possible.
#required packages
import os
import pandas as pd
import numpy as np
import datetime
# create sample time series
header = ['A','B','C','D','E']
period = 5
cols = len(header)
dates = pd.date_range('1/1/2000', periods=period, freq='10min')
dates2 = pd.date_range('1/1/2022', periods=period, freq='10min')
df = pd.DataFrame(np.random.randn(period,cols),index=dates,columns=header)
df0 = pd.DataFrame(np.random.randn(period,cols),index=dates2,columns=header)
df1 = pd.concat([df]*3) #creates duplicate entries by copying the dataframe
df1 = pd.concat([df1, df0])
df2 = df1.sample(frac=1) #shuffles the dataframe
df3 = df1.sort_index() #sorts the dataframe by index
print(df2)
#print(df3)
# Identifying duplicated entries
df4 = df2.duplicated()
print(df4)
I would like to then use the list call out all the duplicate entries for each time stamp. From the code above, is there a good way to call the index that correlates to a bool type that is false?
Edit: added an extra dataframe to create some unique values and tripled the first data frame to create more than a single repeat.Also added more detail to the question.