Efficient way to loop with if statement

Question

I have a sample data look like this (real dataset has more columns):

data = {'stringID':['AB CD Efdadasfd','RFDS EDSfdsadf dsa','FDSADFDSADFFDSA'],'IDct':[1,3,4]}
data = pd.DataFrame(data)
data['Index1'] = [[3,6],[7,9],[5,6]]
data['Index2'] = [[4,8],[10,13],[8,9]]

What i want to achieve is i want to slice stringID column based on second elment in Index1 and Index2 (both are list), only if IDct value is bigger than 1, otherwise return NaN.

I tried this, it works as Output1 column, but there must be a better way (i mean faster when apply to a large dataset) to do it, please kindly advise, thanks!

data['pos'] = data.Index1.map(lambda x: x[1])
data['pos1'] = data.Index2.map(lambda x: x[1])

def cal(m):
    if m['IDct'] > 1:
        return m['stringID'][m['pos']:m['pos1']]
    else:
        return 'NaN'

data['Output1'] = data.apply(cal,axis=1)

You say there "must be a better way to do it". In your case, what would define a "better" way? What is the problem you have with the current method? Memory efficiency, time efficiency, etc? — G. Anderson
– G. Anderson, Commented Sep 24, 2020 at 19:39
I'm thinking a clearer or faster way, if that makes sense. Like calculation time if apply to a very large data set. — April
– April, Commented Sep 24, 2020 at 19:40
Here is a really, really good overview of some times when native pandas methods are best, when loops or apply are just as good, and when to drop back to regular old python — G. Anderson
– G. Anderson, Commented Sep 24, 2020 at 21:20

Yvan Aquino · Accepted Answer · 2020-09-24 20:56:06Z

1

I love pandas - but realistically speaking it's just one of many tools that belong in your tool belt.

pandas and numpy really shine for computation and analysis. It's okay to use pandas to visualize and analyze your data - but that doesn't mean it's the right tool for the job.

This kind of problem is better suited for regular python. Assuming we can, let's move StringID and IDct out of the dict and back into lists. If we assume the result is regular in shape (all lists are of equal length)

StringID = ['AB CD Efdadasfd','RFDS EDSfdsadf dsa','FDSADFDSADFFDSA'],
IDct = [1,3,4]
Index1 = [[3,6],[7,9],[5,6]]
Index2 = [[4,8],[10,13],[8,9]]

for stringID, IDct, Index1, Index2 in zip(stringID, IDct, Index1, Index2):
    result = []
    if IDct > 1:
       result.append(your_indexing_goes_here())
    else:
       result.append(None)

You can then blend the result data back in as you see fit.

data = {
    'StringID': StringID,
    'IDct': IDct,
    'Index1': Index1,
    'Index2': Index2,
    'Result': result
}

pd.DataFrame(data)

answered Sep 24, 2020 at 20:56

Yvan Aquino

1166 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

April Over a year ago

Thank you! I do have a follow up question if lists are with dynamic length: for example i want to pick out second element of the list but some lists only got one value in it. I tried np.where(data['IDct']>1, data.Index1.map(lambda x: x[1]),0) or np.where(data['IDct']>1, [x[1] for x in data['Index1']],0) but all got error of list index out of range...

Yvan Aquino Over a year ago

Use regular Python logic - simple is better. If Index1 and Index2 are of variable length then you use their lengths to draw conclusions on what to do. IE if len(Index1) < 1: None/NaN, elif len(Index1) = 1: Index[0], else: Index[1] .

April Over a year ago

Thanks! I tried data.loc[data['IDct']>1]['Index1'].apply(lambda x:x[1]) and it worked as well!

Collectives™ on Stack Overflow

Efficient way to loop with if statement

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related