I have developed a Python script that reads a CSV file which is a result of a SQL query (just a select * from table) and I perform some transformations and calculations on that dataframe.
I get the dataframe using the following Python commands:
result=csv_df.sort_values(by=['column1','column2','column3'],ascending=True)
result=result.drop_duplicates(['column1','column2'])
Now I need to get the same table using a SQL Query. I have tried the following in T-SQL but I have not been succesful.
select * from data
where column1 IN
(select distinct column1,column2 from data)
and
where column2 IN
(select distinct column1,column2 from data)
order by column1,column2;
I am new to SQL syntax, can someone help me with the query?
What I am trying to do is delete all the duplicated rows from the combination of column1 and column2.
In Python the reason I include column3 is because it has NULL values that I need to discard.
After this should I create a view to keep on performing calculations?