0

what would be the pythonic way to transform multiple arrays of strings into a matrix, where each input string gets its position in the new matrix based on a lexicographical order (or is there even a better criterion?).

In the end, I would like to be able to query the final matrix strings based on a normalized, common criterion and also be able to find out from which inputarray each particular string originally came from.

So for example if I iterate over a bunch of arrays like such (pseudocode!):

array1 = {'01abc','aabc','cba','xyz','999','zz','ZZ'}
array2 = {'0c','aabc','cc','xz','aZZ'}
array3+n = {'...','...','...','....

I'd like to transform that it into something like this:

name        0        9        a        c        x        z        Z 
array1      01abc    999      aabc     cba      xyz      zz       ZZ
array2      0c                aabc     cc       xz
array2                        aZZ
array3...

I already tried googling 2 hours to find my way, but I just don't have the right terminology to describe my problem properly enough... any ideas that can point me into the right direction will be greatly appreciated.

6
  • I'm having trouble understanding what you want - could you maybe explain a bit more about the background of the problem? Commented Sep 25, 2013 at 17:29
  • I am trying to find a way to cluster arrays with similar contents. In the above example array1 and array2 would be similar, because the both contain the string "aabc". I'd thought about using a matrix to speed up the clustering because there will be around 50.000 arrays to be processed. I am aware of software like Mahout etc. for tasks like that... Since my problem is a bit off standard, I'd prefer a direct python implementation Commented Sep 25, 2013 at 17:36
  • It sounds like you want to put the strings into "bins" based on their first letter while also remembering which array each of the strings originally came from. Does the data structure have to be a matrix? It seems to me that another structure would be better... Commented Sep 25, 2013 at 17:37
  • Will google about bins right now. I am not in specific love with a matrix (except for the movie maybe :-)) Commented Sep 25, 2013 at 17:41
  • 1
    So you want to put the details in a table? Maybe try a database - there is an SQL module in the standard library. Or try NumPy. Commented Sep 25, 2013 at 17:41

1 Answer 1

1

You might want to try numpy:

Link

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.