Looks like someone posted a similar solution while I was writing this, but I'll post it anyway because of the timings measured, and some words of explanation.
Using the code suggested below, and with test inputs of length a million (which include a single NaN), I am seeing it go down to under 30% of the time compared to the code in the question.
Time: 0.3486933708190918
{'id': 'id0', 'name': 'name0', 'text': 'text0', 'value': 0.0} {'id': 'id999999', 'name': 'name999999', 'text': 'text999999', 'value': 999999.0} 999999
Time: 1.2175893783569336
{'id': 'id0', 'name': 'name0', 'text': 'text0', 'value': 0.0} {'id': 'id999999', 'name': 'name999999', 'text': 'text999999', 'value': 999999.0} 999999
I think the difference here is partly not having to index the list of tuples, but also I suspect that a large part of it is not having to instantiate a zip object for every element. You are dealing with a small number of dictionary keys with the same names each time, so you really do not need the flexibility that zip affords here, and it is more straightforward just to create the dictionary from a simple explicit expression.
(The zip(list_of_tuples, value_array) is obviously only creating one zip object for the whole operation, so is not significant.)
I've also suggested from math import isnan here rather than doing an attribute lookup to get math.isnan every time, although the difference turns out to be relatively unimportant.
from math import isnan
import numpy as np
import time
# construct some test data
n = 1000000
value_array = np.arange(n, dtype=np.float)
value_array[n // 2] = np.nan
list_of_tuples = [(f"id{i}", f"name{i}", f"text{i}")
for i in range(len(value_array))]
# timings for suggested alternative method
t0 = time.time()
l = [{"id": t[0],
"name": t[1],
"text": t[2],
"value": v}
for t, v in zip(list_of_tuples, value_array) if not isnan(v)]
t1 = time.time()
print("Time:", t1 - t0)
print(l[0], l[-1], len(l))
# timings for the method in the question
t0 = time.time()
l = \
[
dict(
dict(zip(["id", "name", "text"], list_of_tuples[index])),
**{"value": value},
)
for index, value in enumerate(value_array)
if not (isnan(value))
]
t1 = time.time()
print("Time:", t1 - t0)
print(l[0], l[-1], len(l))
Also tried and rejected: create a boolean array of not isnan values, using
not_isnan_array = np.logical_not(np.isnan(value_array))
and then in the list comprehension you can do:
... for t, v, not_isnan in zip(list_of_tuples, value_array, not_isnan_array) if not_isnan
but it has almost no effect on the timing so does not justify the extra memory usage.
UPDATE
Further experimentation with hybrid versions (between the original in the question and the proposed alternative) shows that, as I suspected, most of the difference comes from the avoidance of creating a zip object on every iteration. The avoidance of explicitly indexing the list of tuples accounts for only a small part of the speedup.