2

I have 3D coordinates (strings) in a list that I would like to convert to arrays of floats.

# current list
iPoints = ['-50.0651394154927,-5.3133315588409,0', '-48.7824404616692,3.1894817418136,0', '-46.2317402190515,11.3986203175639,0']

# ideal output
array([[-50.0651394154927,-5.3133315588409,0], [-48.7824404616692,3.1894817418136,0], [-46.2317402190515,11.3986203175639,0]])

A naive implementation:

iPoints = np.array([[float(c) for c in v.split(',')] for v in iPoints])

What would be the fastest way to convert this list of strings to a numpy array of arrays ?

5
  • Forget speed for now. Start with list and string operations. Commented Oct 22, 2019 at 14:32
  • @hpaulj Question has been edited. Speed is an issue when converting large amounts of coordinates. Commented Oct 22, 2019 at 15:21
  • Show us how you'd do it without a focus on speed, and then we can talk about improving it. Commented Oct 22, 2019 at 15:28
  • @hpaulj Sure, question edited. Commented Oct 22, 2019 at 15:41
  • A good start. np.array([v.split(',') ....], dtype=float) can convert the strings to floats, so you don't need the inner comprehension. Commented Oct 22, 2019 at 15:42

1 Answer 1

1

The original solution is surprisingly fast but it can be done faster. You can join the strings to one large buffer and process it with one call to np.fromstring.

Try following code:

# put everthing to a buffer as a large 1D-array separated with commas
buf = ','.join(iPoints)
# parse the buffer
iPoints = np.fromstring(buf, sep=',', dtype=float, count=3*len(iPoints))
# make it 3d again
iPoints = iPoints.reshape(-1,3)

I've made some benchmark.

iPoints=['-50.0651394154927,-5.3133315588409,0', '-48.7824404616692,3.1894817418136,0', '-46.2317402190515,11.3986203175639,0']
# lets make it a little large
iMorePoints = iPoints * 10000

method1 = lambda: np.array([[float(c) for c in v.split(',')] for v in iMorePoints])
method2 = lambda: np.fromstring(','.join(iMorePoints), sep=',', dtype=float, count=3*len(iMorePoints)).reshape(-1,3)

Results on my machine are:

>>> timeit(method1, number=100)
3.6391940720000093
>>> timeit(method2, number=100)
1.0472392480000963

So the proposed solution is 3.5 times faster. The small disadvantage is that one must know in advance that vectors are 3-dimensional. But it can be checked with call iPoints[0].count(',')+1.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the suggestion and the benchmark. I have timed the 3 different solutions (mine, hpaulj's and yours) and get different timings. Overall the results are quite similar on my side.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.