0

I am using pandas in a python notebook to make some data analysis. I am trying to make a simple nested loop, but this is very bad performing.

The problem is that I have two tables made of two columns each, the first containing time stamps (hh:mm:ss) and the second containing some integer values.

The first table (big_table) contains 86400 rows, one for each possible timestamp in a day, and each integer value is initially set to 0. The second table (small_table) contains less rows, one for each timestamp in which an actual integer value is registered. The goal is to map the small_table integers to the big_table integers, in the rows where the timestamp is the same. I also want to write the last written integer when the small_table timestamp is not found in the big_table timestamps.

I am doing this trying to "force" a Java/C way of doing it, which iterates over each element accessing them as the [i][j] elements of a matrix.

Is there any better way of doing this using pandas/numpy?

Code:

rel_time_pointer = small_table.INTEGER.iloc[0]

for i in range(small_table.shape[0]):

  for j in range(big_table.shape[0]):

    if (small_table.time.iloc[i] == big_table.time.iloc[j]):
      rel_time_pointer = small_table.INTEGER.iloc[i]
      big_table.INTEGER.iloc[j] = rel_time_pointer
      break

    else:
      big_table.INTEGER.iloc[j] = rel_time_pointer

example:

big_table:
time        INTEGER
00:00:00    0
00:00:01    0
00:00:02    0
00:00:03    0
00:00:04    0
00:00:05    0
00:00:06    0
    .
    .
    .
23:59:59    0

small_table:
time        INTEGER
00:00:03    100
00:00:05    100

big_table_after_execution:
time        INTEGER
00:00:00    0
00:00:01    0
00:00:02    0
00:00:03    100
00:00:04    100
00:00:05    200
00:00:06    200

Using the @gtomer merge command:

    big_table = big_table.merge(small_table, on='time', how='left')

and adding .fillna(0) at the end of the command I get:

    time    INTEGER__x  INTEGER__y
    00:00:00    0   0.0
    00:00:01    0   0.0
    ...         ... ...

with the INTEGER values of small_table in the right places of big_table_after_execution. Now I'm trying to set the 0 values to the not-0 top element:

    time    INTEGER__x  INTEGER__y
    00:00:00    0   0.0
    00:00:01    0   0.0
    00:00:02    0   0.0
    00:00:03    0   1.0
    00:00:04    0   1.0
    00:00:05    0   2.0
    00:00:06    0   2.0

instead of:

    00:00:00    0   0.0
    00:00:01    0   0.0
    00:00:02    0   0.0
    00:00:03    0   1.0
    00:00:04    0   0.0
    00:00:05    0   2.0
    00:00:06    0   0.0
7
  • Loops are last resort in Pyton as there are many more efficient technics. Post a sample of your data and the expected output and we may find a better and quicker solution. Commented Mar 3, 2022 at 14:51
  • Thanks @gtomer, I have updated my question with sample data Commented Mar 3, 2022 at 15:33
  • Isn't it a simple merge?? Commented Mar 3, 2022 at 19:11
  • Almost. I tried with: big_table.merge(small_table, how='outer', on='time') and I obtained a big_table containing the small_table values in the correct places, but NaN everywhere else, where I would have had the last not-NaN (from the top to the bottom) Commented Mar 3, 2022 at 19:49
  • Sorry. I am confused. The result of 00:00:04 should be 100 or 0? Commented Mar 4, 2022 at 13:53

2 Answers 2

1

Please try the following:

big_table_after_execution = big_table.merge(small_table, on='time', how='left')

Please post the output you get and we'll continue from there

Sign up to request clarification or add additional context in comments.

2 Comments

I updated the main question with the output because code in comments is not well formatted
Adding big_table_after_execution = big_table.merge(small_table, on='time', how='left').fillna(0) and for i in range(big_table.shape[0]): if (big_table.INTEGER__y.iloc[i] != 0.0): not0 = big_table.INTEGER__y.iloc[i] big_table.INTEGER__y.iloc[i] = not0 solved the issue
0

Numpy iteration and enumeration options:

if you have a 2d np.ndarray type object, then iteration can be achieved in one line as follows:

for (i,j), value in np.ndenumerate(ndarray_object):... 

This works like regular enumerate, but allows you to deconstruct the higher dimensional index into a tuple of appropriate dimensions.

You could maybe place your values into a 2d array structure from numpy and iterate through them like that?

The easiest way to modify what you already have so that it looks less 'c-like' is probably to just use regular enumerate:

for small_index, small_value in enumerate(small_table):

  for big_index, big_value in enumerate(big_table):...

zip

Another option for grouping your iteration together is the zip() function, which will combine iterable 1 and 2, but it will only produce a resultant iterable as with a length equal to the minimum iterable length.

1 Comment

Thanks @Friskod, I was hoping to avoid explicit iterators. The above code is running in my notebook since 4 hours :-(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.