Improve nested loop with pandas

Question

I am using pandas in a python notebook to make some data analysis. I am trying to make a simple nested loop, but this is very bad performing.

The problem is that I have two tables made of two columns each, the first containing time stamps (hh:mm:ss) and the second containing some integer values.

The first table (big_table) contains 86400 rows, one for each possible timestamp in a day, and each integer value is initially set to 0. The second table (small_table) contains less rows, one for each timestamp in which an actual integer value is registered. The goal is to map the small_table integers to the big_table integers, in the rows where the timestamp is the same. I also want to write the last written integer when the small_table timestamp is not found in the big_table timestamps.

I am doing this trying to "force" a Java/C way of doing it, which iterates over each element accessing them as the [i][j] elements of a matrix.

Is there any better way of doing this using pandas/numpy?

Code:

rel_time_pointer = small_table.INTEGER.iloc[0]

for i in range(small_table.shape[0]):

  for j in range(big_table.shape[0]):

    if (small_table.time.iloc[i] == big_table.time.iloc[j]):
      rel_time_pointer = small_table.INTEGER.iloc[i]
      big_table.INTEGER.iloc[j] = rel_time_pointer
      break

    else:
      big_table.INTEGER.iloc[j] = rel_time_pointer

example:

big_table:
time        INTEGER
00:00:00    0
00:00:01    0
00:00:02    0
00:00:03    0
00:00:04    0
00:00:05    0
00:00:06    0
    .
    .
    .
23:59:59    0

small_table:
time        INTEGER
00:00:03    100
00:00:05    100

big_table_after_execution:
time        INTEGER
00:00:00    0
00:00:01    0
00:00:02    0
00:00:03    100
00:00:04    100
00:00:05    200
00:00:06    200

Using the @gtomer merge command:

    big_table = big_table.merge(small_table, on='time', how='left')

and adding .fillna(0) at the end of the command I get:

    time    INTEGER__x  INTEGER__y
    00:00:00    0   0.0
    00:00:01    0   0.0
    ...         ... ...

with the INTEGER values of small_table in the right places of big_table_after_execution. Now I'm trying to set the 0 values to the not-0 top element:

    time    INTEGER__x  INTEGER__y
    00:00:00    0   0.0
    00:00:01    0   0.0
    00:00:02    0   0.0
    00:00:03    0   1.0
    00:00:04    0   1.0
    00:00:05    0   2.0
    00:00:06    0   2.0

instead of:

    00:00:00    0   0.0
    00:00:01    0   0.0
    00:00:02    0   0.0
    00:00:03    0   1.0
    00:00:04    0   0.0
    00:00:05    0   2.0
    00:00:06    0   0.0

Loops are last resort in Pyton as there are many more efficient technics. Post a sample of your data and the expected output and we may find a better and quicker solution. — gtomer
– gtomer, Commented Mar 3, 2022 at 14:51
Almost. I tried with: big_table.merge(small_table, how='outer', on='time') and I obtained a big_table containing the small_table values in the correct places, but NaN everywhere else, where I would have had the last not-NaN (from the top to the bottom) — Kyle
– Kyle, Commented Mar 3, 2022 at 19:49
Sorry. I am confused. The result of 00:00:04 should be 100 or 0? — gtomer
– gtomer, Commented Mar 4, 2022 at 13:53

gtomer · Accepted Answer · 2022-03-04 08:15:10Z

1

Please try the following:

big_table_after_execution = big_table.merge(small_table, on='time', how='left')

Please post the output you get and we'll continue from there

answered Mar 4, 2022 at 8:15

gtomer

6,6041 gold badge14 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Kyle Over a year ago

I updated the main question with the output because code in comments is not well formatted

Kyle Over a year ago

Adding big_table_after_execution = big_table.merge(small_table, on='time', how='left').fillna(0) and

for i in range(big_table.shape[0]):   if (big_table.INTEGER__y.iloc[i] != 0.0):     not0 = big_table.INTEGER__y.iloc[i]   big_table.INTEGER__y.iloc[i] = not0

solved the issue

Friskod · Accepted Answer · 2022-03-03 15:03:12Z

0

Numpy iteration and enumeration options:

if you have a 2d np.ndarray type object, then iteration can be achieved in one line as follows:

for (i,j), value in np.ndenumerate(ndarray_object):...

This works like regular enumerate, but allows you to deconstruct the higher dimensional index into a tuple of appropriate dimensions.

You could maybe place your values into a 2d array structure from numpy and iterate through them like that?

The easiest way to modify what you already have so that it looks less 'c-like' is probably to just use regular enumerate:

for small_index, small_value in enumerate(small_table):

  for big_index, big_value in enumerate(big_table):...

zip

Another option for grouping your iteration together is the zip() function, which will combine iterable 1 and 2, but it will only produce a resultant iterable as with a length equal to the minimum iterable length.

answered Mar 3, 2022 at 15:03

Friskod

664 bronze badges

1 Comment

Kyle Over a year ago

Thanks @Friskod, I was hoping to avoid explicit iterators. The above code is running in my notebook since 4 hours :-(

Collectives™ on Stack Overflow

Improve nested loop with pandas

2 Answers 2

2 Comments

Numpy iteration and enumeration options:

zip

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Numpy iteration and enumeration options:

zip

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related