0

I sorted my CSV file to make some calculates. Python 2.7

import pandas as pd
df = pd.read_csv('Cliente_x_Pais_Sitio.csv', sep=',')
df1 = df.sort_values(by=['Cliente','Auth_domain','Sitio',"Country"])
df1.to_csv('test.csv')

CSV data (test.csv):

Cliente,Fecha,Auth_domain,Sitio,Country,ECPM_medio
FF,15/12/2017,@ff,ff_Color,Afganistán,0.53
FF,15/01/2018,@ff,ff_Color,Afganistán,0.5
FF,15/01/2017,@ff,ff_Color,Alemania,0.34
FF,15/12/2017,@ff,ff_Color,Alemania,0.38
FF,15/01/2018,@ff,ff_Color,Alemania,0.37

What I need:

if (15/12/2017 ECPM) ≤ (15/01/2018 ECPM):
    if ((15/12/2017 ECPM)*0.8) ≥ (15/01/2017 ECPM):
        r = (15/01/2017 ECPM)
    else:
        r = ((15/12/2017 ECPM)*0.8)
else:
    if (15/01/2018 ECPM) ≥ (15/01/2017 ECPM):
        r = (15/01/2017 ECPM)
    else:
        r = (15/01/2018 ECPM)

Filling in the real data, the first two lines would be:

if 0.53 ≤ 0.5:
    if 0.5 ≥ 0: #if we don't have the cell value I would like to add a 0 True
        r = 0.5

Remember I have more than 10,000 rows son I need a multiple form

The new CSV should show me this:

Cliente,Auth_domain,Sitio,Country,Recomendation_ECPM
FF,@ff,ff_Color,Afganistán,0.5
FF,@ff,ff_Color,Alemania,0.34
1
  • It's very difficult to me explain what I need because I'm not American! So, my apologies. ECPM is the ECPM_Medio column of my Original CSV. And about the if - else, if the ECPM_Medio value of 15/12/2017 is smaller than ECPM_Medio value of 15/01/2018.... @pault Commented Jan 17, 2018 at 13:46

1 Answer 1

1

I'm not sure I have the correct

  1. date selection in setval or
  2. the return value logic in compare_val

But the pipeline regardless of those uses sort, group_by, and transform. Because we'll compare the edges to nan (shift(-1) on first, and shift(1) on the end), we have to remove them at the end.

# build data
from StringIO import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""Cliente,Fecha,Auth_domain,Sitio,Country,ECPM_medio
FF,15/12/2017,@ff,ff_Color,Afganistán,0.53
FF,15/01/2018,@ff,ff_Color,Afganistán,0.5
FF,15/01/2017,@ff,ff_Color,Alemania,0.34
FF,15/12/2017,@ff,ff_Color,Alemania,0.38
FF,15/01/2018,@ff,ff_Color,Alemania,0.37
""")).sort_values(by='Fecha')

# functions to parse
def compare_val(cur,past,future):
   if cur <= past:
       cur_adj = cur * .8
       if cur_adj >= past:
            return(past)
       else:
            return(cur_adj)
   else:
        if future >= past:
           return(past)
        else:
           return(future)

def setval(v):
      cur, past, future = v, v.shift(-1), v.shift(1)
      v = [ compare_val(*x) for x in zip(cur,past,future)]
      return(v)

# do the work
df['Recomendation_ECPM'] = df.\
      groupby(['Cliente','Auth_domain','Sitio',"Country"])['ECPM_medio'].\
      transform(setval)

df[ pd.notna(df['Recomendation_ECPM']) ]
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for trying but is not the exactly Im looking for! If you see the results I need on the question, first, I need a new csv and also, I need to compare the Fecha` through the same Cliente, Auth_domain,Sitio,Country
the .groupby[...] should limit the comparison to within those columns (Cliente, Auth, etc). add .to_csv on the last line to save the output to a file
if I want to save it i have an error: 'DataFrame' object has no attribute 'to_save'Also, please, print your results and you will see another answer that I'm not looking for
Please, see the file: First your answer, and second what I need drive.google.com/open?id=1rIhY8GdM2SRebiYcT8iX7iiPVj_IO0l6

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.