0

Sample code

import numpy as np
import time


class A:
    def __init__(self, n):
        self.n = n

    def str_n(self):
        return str(self.n)


idx = np.asarray(list(range(30000)))
l_a = []
for i in range(400000):
    l_a.append(A(i))

l_a_arr = np.asarray(l_a)
l_a_str_arr = np.asarray([i.str_n() for i in l_a])


s_time = time.time()
l_a_idx_str_arr = l_a_str_arr[idx].tolist()
cost_time = time.time() - s_time
print("String array cost time is ", cost_time)

s_time = time.time()
l_a_idx_arr = l_a_arr[idx].tolist()
cost_time = time.time() - s_time
print("Class array cost time is ", cost_time)

The logs:

String array cost time is 0.0014674663543701172
Class array cost time is 0.0003917217254638672

UPDATE
repeat 1000 time and remove tolist()

import numpy as np
import time


class A:
    def __init__(self, n):
        self.inner_n = n + 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

    def str_n(self):
        return str(self.inner_n)


idx = np.asarray(list(range(30000)))
l_a = []
for i in range(400000):
    l_a.append(A(i))

l_a_arr = np.asarray(l_a)
l_a_str_arr = np.asarray([i.str_n() for i in l_a])

avg_time = []
for i in range(1000):
    s_time = time.time()
    l_a_idx_str_arr = l_a_str_arr[idx].tolist()
    cost_time = time.time() - s_time
    avg_time.append(cost_time)
print("String array cost time with tolist is ", np.average(avg_time))

avg_time1 = []
for i in range(1000):
    s_time = time.time()
    l_a_idx_arr = l_a_arr[idx].tolist()
    cost_time = time.time() - s_time
    avg_time1.append(cost_time)
print("Class array cost time with tolist is ", np.average(avg_time1))

avg_time2 = []
for i in range(1000):
    s_time = time.time()
    l_a_idx_str_arr = l_a_str_arr[idx]
    cost_time = time.time() - s_time
    avg_time2.append(cost_time)
print("String array cost time is ", np.average(avg_time2))

avg_time3 = []
for i in range(1000):
    s_time = time.time()
    l_a_idx_arr = l_a_arr[idx]
    cost_time = time.time() - s_time
    avg_time3.append(cost_time)
print("Class array cost time is ", np.average(avg_time3))

The logs:

String array 1000 average cost time with tolist is 0.0037294850349426267
Class array 1000 average cost time with tolist is 0.00030662870407104493
String array 1000 average cost time is 0.0014972503185272216
Class array 1000 average cost time is 0.0001489844322204589

The array of strings is a part of array of object, why its indexing spent more time?

3
  • Remove the .tolist() calls and try the benchmark again please. Commented Aug 10, 2021 at 14:12
  • You need to repeat the statement(s) you're trying to time many times to get an accurate estimate of execution time. Commented Aug 10, 2021 at 14:13
  • @CaptainTrojan I've removed the tolist(), still the same. Commented Aug 10, 2021 at 15:03

1 Answer 1

1

Object dtype arrays are like lists, storing references to objects. Indexing is nearly as fast as with lists.

String dtype arrays store strings as bytes, just as they do with numbers. Indexing individual elements is slower since it requires a conversion from the numpy bytes to python strings ('unboxing').

Arrays are best used 'whole' rather than iteratively.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.