How to transform array of strings into matrix with python

Question

what would be the pythonic way to transform multiple arrays of strings into a matrix, where each input string gets its position in the new matrix based on a lexicographical order (or is there even a better criterion?).

In the end, I would like to be able to query the final matrix strings based on a normalized, common criterion and also be able to find out from which inputarray each particular string originally came from.

So for example if I iterate over a bunch of arrays like such (pseudocode!):

array1 = {'01abc','aabc','cba','xyz','999','zz','ZZ'}
array2 = {'0c','aabc','cc','xz','aZZ'}
array3+n = {'...','...','...','....

I'd like to transform that it into something like this:

name        0        9        a        c        x        z        Z 
array1      01abc    999      aabc     cba      xyz      zz       ZZ
array2      0c                aabc     cc       xz
array2                        aZZ
array3...

I already tried googling 2 hours to find my way, but I just don't have the right terminology to describe my problem properly enough... any ideas that can point me into the right direction will be greatly appreciated.

I'm having trouble understanding what you want - could you maybe explain a bit more about the background of the problem? — ali_m
– ali_m, Commented Sep 25, 2013 at 17:29
I am trying to find a way to cluster arrays with similar contents. In the above example array1 and array2 would be similar, because the both contain the string "aabc". I'd thought about using a matrix to speed up the clustering because there will be around 50.000 arrays to be processed. I am aware of software like Mahout etc. for tasks like that... Since my problem is a bit off standard, I'd prefer a direct python implementation — Jabb
– Jabb, Commented Sep 25, 2013 at 17:36
It sounds like you want to put the strings into "bins" based on their first letter while also remembering which array each of the strings originally came from. Does the data structure have to be a matrix? It seems to me that another structure would be better... — senderle
– senderle, Commented Sep 25, 2013 at 17:37
Will google about bins right now. I am not in specific love with a matrix (except for the movie maybe :-)) — Jabb
– Jabb, Commented Sep 25, 2013 at 17:41
So you want to put the details in a table? Maybe try a database - there is an SQL module in the standard library. Or try NumPy. — rlms
– rlms, Commented Sep 25, 2013 at 17:41

Glorfindel · Accepted Answer · 2023-01-06 21:05:55Z

1

You might want to try numpy:

Link

edited Jan 6, 2023 at 21:05

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Sep 25, 2013 at 17:38

Mingyu

33.8k14 gold badges58 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to transform array of strings into matrix with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related