1

I would like to piece together a new list which is a string using two columns of a numpy array. However, I can't seem to get this to work without looping through each element:

import numpy as np
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1))
print(test_list[:,0])
print(test_list[:,1])

def dumbstring(points):
    # Loop through and append a list
    string_pnts = []
    for x in points:
        string_pnts.append("X co-ordinate is %g and y is %g" % (x[0], x[1]))
    return string_pnts

def dumbstring2(points):
    # Prefill a list
    string_pnts = [""] * len(points)
    i = 0
    for x in points:
        string_pnts[i] = ("X co-ordinate is %g and y is %g" % (x[0], x[1]))
        i += 1
    return string_pnts

def numpystring(points):
    return ("X co-ordinate is %g and y is %g" % (points[:,0], points[:,1]))    

def numpystring2(point_x, point_y):
    return ("X co-ordinate is %g and y is %g" % (point_x, point_y))

The first two work (I would have thought pre-filling would be faster than appending but it seems the same):

%timeit tdumbstring = dumbstring(test_list) # 239ms
%timeit tdumbstring2 = dumbstring2(test_list) # 239ms

However, the last do not - I wonder is there no way to vectorise this function then?

tnumpystring = numpystring(test_list) # Error
tnumpystring2 = numpystring2(test_list[:,0],test_list[:,1]) # Error

Edit:

I tried Pandas as I don't actually need Numpy, however it was a bit slower:

import pandas as pd
df = pd.DataFrame(test_list)
df.columns = ['x','y']
% time pdtest = ("X co-ordinate is " + df.x.map(str) + " and y is " + df.y.map(str)).tolist()
print(test[:5])

I also tried mapping but that was also slower than looping through np:

def mappy(pt_x,pt_y):
    return("X co-ordinate is %g and y is %g" % (pt_x, pt_y))
%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
print(mtest1[:5])

Timings:

enter image description here

2
  • I tried using calling map instead of using a for loop, but that didn't do much. From what I see, the string formatting of the two points is taking the most time. I also toyed around with numpy.savetxt and a virtual StringIO "file" but that only slowed everything down. Take a look here for a related discussion: stackoverflow.com/questions/2721521/… Commented Feb 25, 2016 at 10:09
  • Thanks Greg, I also tried map and found it a bit slower. What was weird: I tried pandas and that was slower too Commented Feb 25, 2016 at 10:24

1 Answer 1

1

Here's a solution using numpy.core.defchararray.add, first set your type to str.

from numpy.core.defchararray import add    
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1)).astype(str)

def stringy_arr(points):
    return add(add('X coordinate is ', points[:,0]),add(' and y coordinate is ', points[:,1]))

slightly faster timing:

%timeit stringy_arr(test_list)
1 loops, best of 3: 216 ms per loop

array(['X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6', ...,
       'X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6'], 
      dtype='|S85')

# Previously tried functions
%time dumbstring(test_list)
1 loops, best of 3: 340 ms per loop

%timeit tdumbstring2 = dumbstring2(test_list)
1 loops, best of 3: 320 ms per loop

%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
1 loops, best of 3: 340 ms per loop

EDIT

You could also just use pure python with comprehension, much faster than my first proposed solution:

test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(10000000,1)).astype(str)  #10M
test_list = test_list.tolist()

def comp(points):
    return ['X coordinate is %s Y coordinate is %s' % (x,y) for x,y in points]

%timeit comp(test_list)
1 loops, best of 3: 6.53 s per loop

['X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',
 'X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',...

%timeit dumbstring(test_list)
1 loops, best of 3: 30.7 s per loop
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! Peculiar but I checked with 10,000,000 and for some reason: loop-append-list: 25.1s, prefill_list: 24.7s, map-lambda: 28s, pandas_df: 72s, stringy-including-time-to-str: 71s, stringy-already_array_string: 77s
Just ran it at 10,000,000; %timeit dumbstring(test_list) was 1 loops, best of 3: 31.3 s per loop` and %timeit stringy_arr(test_list) was 1 loops, best of 3: 21.5 s per loop. I don't know if any are really ideal, not surprising because the solution I gave is still 'element-wise'...
Kevin, apologies but I added a screenshot to my original post as I feel I'm going crazy. The basic for-loop appears to be the fastest for me ...
In your image, you missed a step for the list comprehension function, convert the array to a list, then test it. test_list = test_list.tolist(). See if that helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.