0

I'm trying to generate a new column based on multiple conditions of various columns. My code runs without traceback errors. Below is a snippet of the dataframe and code.

This is the starting data

import pandas as pd
import numpy as np

dfc = pd.read_csv(r'C:\\Users\\...01.csv', header='infer')

condition = [dfc['N']==0, dfc['count']==dfc['N'], (dfc['count'] > dfc['N']) & (dfc['N'] != 0)]
rng_result = [str(dfc['i']) + '-' + str(dfc['a']),'None','None to Many'] 
dfc['rng'] = np.select(condition, rng_result, np.nan)

dfc.to_csv(r'C:\\Users\\...R_01.csv', index=False)

It might be that I don't understand numpy, the middle and last conditions come out fine. The first condition provide an array, which is not wanted. I want a string with the rows 'i' and 'a' value as I typed it below.

enter image description here

1
  • Could you please provide the input and the expected output as text in the question, so others can easily reproduce it? Also, if I'm reading the data correctly, the only columns that impact 'rng' are i and a - right? Commented Jul 9, 2020 at 6:48

2 Answers 2

2

The conditions are not readily apparent from the question, but is the question intended to address the following? I'm using np.where() to combine strings in the case of 'True'.

import pandas as pd
import numpy as np
import io

data = '''
count,i,a,N
1,1.4,1.4,0
1,0,0,1
2,110,140,0
3,0,0,3
4,3.5,5.1,0
4,19,22,0
'''

df = pd.read_csv(io.StringIO(data), sep=',')
df['rng'] = np.where((df['N'] == 1)|(df['N'] == 3), None, df['i'].astype(str)+'-'+df['a'].astype(str))
df
count   i     a       N rng
0   1   1.4   1.4     0 1.4-1.4
1   1   0.0   0.0     1 None
2   2   110.0   140.0   0   110.0-140.0
3   3   0.0   0.0     3 None
4   4   3.5   5.1     0 3.5-5.1
5   4   19.0    22.0    0   19.0-22.0
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the answer. Can this be modified for more conditions and choices? Like the condition df['count'] > df['N'] would result in "many".
Conditions can be added with &(and) or|(or).
It does not work to have more than 2 conditions (Cons) because when I have 3 Cons I get the error. "TypeError: where() takes from 1 to 3 positional arguments but 4 were given." This is my mental picture of np.where( Con_1 | Con_2, choice_1, choice_2 ) because Con_3 and choice_3 is not allowed, but I want 3.
np.where has two choices: true='A', False='B'. You can't do more than that. It is possible to have multiple conditions like (A==1)|((B==3)&(C==2)).
1

Using apply I find it more readable and maintainable

data = [["count","i","a","N"],
       [1,1.4,1.4,0],
       [1,0,0,1],
       [2,110,140,0],
       [3,0,0,3],
        [4,3.5,5.1,0],
        [4,19,22,0],
        [50,0,17,21],
        [25,0,0,25]]

def cond(r):
    val = "tbd"
    if r.N==0: val = str(r.i)+"-"+str(r.a) 
    if r["count"]==r.N: val = "None"
    if (r['count'] > r['N']) and (r['N'] != 0): val = 'None to Many'
    return val

df = pd.DataFrame(data[1:], columns=data[0])
df["rng"] = df.apply(lambda r: cond(r), axis=1)

2 Comments

I'm a little confused in the if statements. "i,a,N" are all columns. In the line: if r.N==0: val = str(r.i)+"-"+str(r.a) should this be corrected to: if r['N']==0: val = str(r['i'])+"-"+str(r['a']) ?
I've used mixed conventions for accessing columns. r.N is synonymous with r["N"]. Some code I typed, other code I copy/pasted....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.