Fastest way to convert 3D coordinates from string to float in a numpy array

Question

I have 3D coordinates (strings) in a list that I would like to convert to arrays of floats.

# current list
iPoints = ['-50.0651394154927,-5.3133315588409,0', '-48.7824404616692,3.1894817418136,0', '-46.2317402190515,11.3986203175639,0']

# ideal output
array([[-50.0651394154927,-5.3133315588409,0], [-48.7824404616692,3.1894817418136,0], [-46.2317402190515,11.3986203175639,0]])

A naive implementation:

iPoints = np.array([[float(c) for c in v.split(',')] for v in iPoints])

What would be the fastest way to convert this list of strings to a numpy array of arrays ?

Forget speed for now. Start with list and string operations. — hpaulj
– hpaulj, Commented Oct 22, 2019 at 14:32
@hpaulj Question has been edited. Speed is an issue when converting large amounts of coordinates. — solub
– solub, Commented Oct 22, 2019 at 15:21
Show us how you'd do it without a focus on speed, and then we can talk about improving it. — hpaulj
– hpaulj, Commented Oct 22, 2019 at 15:28
A good start. np.array([v.split(',') ....], dtype=float) can convert the strings to floats, so you don't need the inner comprehension. — hpaulj
– hpaulj, Commented Oct 22, 2019 at 15:42

tstanisl · Accepted Answer · 2019-10-22 19:16:57Z

1

The original solution is surprisingly fast but it can be done faster. You can join the strings to one large buffer and process it with one call to np.fromstring.

Try following code:

# put everthing to a buffer as a large 1D-array separated with commas
buf = ','.join(iPoints)
# parse the buffer
iPoints = np.fromstring(buf, sep=',', dtype=float, count=3*len(iPoints))
# make it 3d again
iPoints = iPoints.reshape(-1,3)

I've made some benchmark.

iPoints=['-50.0651394154927,-5.3133315588409,0', '-48.7824404616692,3.1894817418136,0', '-46.2317402190515,11.3986203175639,0']
# lets make it a little large
iMorePoints = iPoints * 10000

method1 = lambda: np.array([[float(c) for c in v.split(',')] for v in iMorePoints])
method2 = lambda: np.fromstring(','.join(iMorePoints), sep=',', dtype=float, count=3*len(iMorePoints)).reshape(-1,3)

Results on my machine are:

>>> timeit(method1, number=100)
3.6391940720000093
>>> timeit(method2, number=100)
1.0472392480000963

So the proposed solution is 3.5 times faster. The small disadvantage is that one must know in advance that vectors are 3-dimensional. But it can be checked with call iPoints[0].count(',')+1.

answered Oct 22, 2019 at 19:16

tstanisl

14.3k3 gold badges31 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

solub Over a year ago

Thank you for the suggestion and the benchmark. I have timed the 3 different solutions (mine, hpaulj's and yours) and get different timings. Overall the results are quite similar on my side.

Collectives™ on Stack Overflow

Fastest way to convert 3D coordinates from string to float in a numpy array

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related