0

I have 2 dataframe i.e df1 and df2 as follows

df1=pd.read_csv("abc.csv")
print (df1.head(10))
df2=pd.read_csv("xyz.csv")
print (df2.head(10))

                     A              B
          0  2019-01-01 03:56:29  197.199997
          1  2019-01-01 04:02:29  197.186142
          2  2019-01-02 06:24:29  196.857986
          3  2019-01-02 06:42:29  196.816376
          4  2019-01-03 11:52:29  196.100006
          5  2019-01-03 12:00:30  196.015961
          6  2019-01-04 14:18:30  194.566376
          7  2019-01-04 14:38:30  194.356293
          8  2019-01-04 19:48:30  191.100006
          9  2019-01-05 19:56:30  191.081512

                 C                  D
          0  2019-01-1  18:00:00  1333
          1  2019-01-2  19:00:00  1.18
          2  2019-01-3  20:00:00  1666667
          3  2019-01-4  21:00:00  0
          4  2019-01-5  22:00:00  1
          5  2019-01-6  23:00:00  1.5
          6  2019-01-7  00:00:00  109
          7  2019-01-8  01:00:00  200
          8  2019-01-9  02:00:00  192
          9  2019-01-10 03:00:00  1.700000

df2 has hourly wise average data ,Now how to select values for only date in df1 where df2 column "D" has value more than 2 i.e output will look like ,

                     A           B
      0  2019-01-01 03:56:29  197.199997
      1  2019-01-01 04:02:29  197.186142
      2  2019-01-03 11:52:29  196.100006
      4  2019-01-03 12:00:30  196.015961

i have tried like

,`final_data=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)
  final_data=final_data[final_data["D"] > 2]

but i didnt get the proper output , can any one please help me with the solution

3 Answers 3

1

You could try this :

import pandas as pd

df1 = pd.read_csv("file.csv")
df2 = pd.read_csv("file2.csv")

df2['C'] = pd.to_datetime(df2['C'], format='%Y-%m-%d')
dates = []

for ind in df2.index:
    if(df2['D'][ind]>2):
         date_tup = (df2['C'][ind].year,df2['C'][ind].month,df2['C'][ind].day)
         dates.append(date_tup)

df1['A'] = pd.to_datetime(df1['A'], format='%Y-%m-%d', errors='ignore')

for ind in df1.index:
    date_tup = (df1['A'][ind].year,df1['A'][ind].month,df1['A'][ind].day)
    if(date_tup not in dates):
         df1 = df1.drop([ind])

print(df1)

file1.csv :

A,B
2019-01-01 03:56:29,197.199997
2019-01-01 04:02:29,197.186142
2019-01-02 06:24:29,196.857986
2019-01-02 06:42:29,196.816376
2019-01-03 11:52:29,196.100006
2019-01-03 12:00:30,196.015961
2019-01-04 14:18:30,194.566376
2019-01-04 14:38:30,194.356293
2019-01-04 19:48:30,191.100006
2019-01-05 19:56:30,191.081512

file2.csv :

C,D
2019-01-01 18:00:00,1333
2019-01-02 19:00:00,1.18
2019-01-03 20:00:00,1666667
2019-01-04 21:00:00,0
2019-01-05 22:00:00,1
2019-01-06 23:00:00,1.5
2019-01-07 00:00:00,109
2019-01-08 01:00:00,200
2019-01-09 02:00:00,192
2019-01-10 03:00:00,1.700000
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks buddy you are champ ,just u need to add df1['A'][ind].hour also
0

Try this out:

final_data=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)
final_data=final_data[final_data["D"] > 2,['A','B']]

1 Comment

Its not working , Its giving me invalid key Error for "A","B"
0

Assuming that the indices of the two matrices match and that you only want to keep the information from df1 then:

df1[df2['D'] > 2] 

should do the trick.

1 Comment

Yeah, but the hours,minutes and seconds don't match, so can't use this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.