1

I would like to increase the speed at which this operation works

df = pd.DataFrame(columns = ['eventId','total'])
for event in df_events:
    df1 = data[data['eventId'] == event]
    df = pd.concat([df,df1])

df_events is an object containing a elements that look like this '2015-11-23#54#' This works for the purpose i want but i wondered if there was a quicker way of doing this without using a for loop.

2 Answers 2

3

Try this:

df = data[data["eventId"].isin(df_events)]
Sign up to request clarification or add additional context in comments.

1 Comment

I completely forgot about the isin() method, but this is probably even faster, and certainly more readable, than my answer.
2

A one-liner without a loop can do what you want to do for you:

df = data[data["eventId"].apply(lambda x: x in df_events)]

This is, indeed, notably faster than your current solution (I tried that with a very, very small data):

data = pd.DataFrame({'eventId': {0: '2015-11-23#54#',
    1: '2015-11-23#55#',
    2: '2015-11-23#56#',
    3: '2015-11-23#54#',
    4: '2015-11-23#55#',
    5: '2015-11-23#56#'},
    'total': {0: 2, 1: 8, 2: 9, 3: 4, 4: 3, 5: 5}})

df_events = ['2015-11-23#54#', '2015-11-23#56#']

In [14]: %timeit df = data[data["eventId"].apply(lambda x: x in df_events)]
1000 loops, best of 3: 737 µs per loop

In [15]: %%timeit df = pd.DataFrame(columns = ['eventId','total'])
   ....: for event in df_events:
   ....:     df1 = data[data['eventId'] == event]
   ....:     df = pd.concat([df,df1])
   ....: 
100 loops, best of 3: 8.18 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.