-1

I need to sort array A columns according to one column of array B, but the order is given by strings: I want the array A columns to be ordered like the second column of array B (B[:,1])

array A

family  id  mum dad     rs1     rs2     rs3     rs4     rs5     rs6     rs7      rs8     rs9     rs10   rs11    rs12

     1   1   4    6     A T     A A     T T     C C     G G     A T     A G      A A     G A     T A     G G     C C 

     2   2   7    9     T A     G A     C T     C T     G A     T T     A A      A C     G G     T A     C C     C T

     3   3   2    8     T T     G G     C T     C T     G G     A T     A G      A C     G G     T T     C C     C C

     4   4   5    1     A A     A A     T T     C C     G A     T T     A A      A A     G A     T A     G C     C T

array B

1   rs1    2345
1   rs5    2346
2   rs6    2348
4   rs8    2351
4   rs12   2360
3   rs2    2456
2   rs3    2453
3   rs10   5672
1   rs9    78923
5   rs7    5738
2   rs4    3546
6   rs11   6354

Desired output:

family  id  mum dad  rs1     rs5     rs6     rs8     rs12    rs2     rs3     rs10    rs9     rs7     rs4     rs11
   1    1   4     6  A T     G G     A T     A A     C C     A A     T T     T A     G A     A G     C C     G G
   2    2   7     9  T A     G A     T T     A C     C T     G A     C T     T A     G G     A A     C T     C C
   3    3   2     8  T T     G G     A T     A C     C C     G G     C T     T T     G G     A G     C T     C C
   4    4   5     1  A A     G A     T T     A A     C T     A A     T T     T A     G A     A A     C C     G C

I hope this is clear enough! Thank you!

1

1 Answer 1

0

As has been pointed out in the comments, this is essentially a duplicate of this question: Combine two columns under one header in Numpy array

So this answer is essentially cut and paste from there, only I'm using tabs for delimiters since I understand that is what you are using.

First we start off with the A array using StringIO to generate the sample data.

import numpy
from StringIO import StringIO

a = StringIO("""family\tid\tmum\tdad\trs1\trs2\trs3\trs4\trs5\trs6\trs7\trs8\trs9\trs10\trs11\trs12
1\t1\t4\t6\tA T\tA A\tT T\tC C\tG G\tA T\tA G\tA A\tG A\tT A\tG G\tC C 
2\t2\t7\t9\tT A\tG A\tC T\tC T\tG A\tT T\tA A\tA C\tG G\tT A\tC C\tC T 
3\t3\t2\t8\tT T\tG G\tC T\tC T\tG G\tA T\tA G\tA C\tG G\tT T\tC C\tC C 
4\t4\t5\t1\tA A\tA A\tT T\tC C\tG A\tT T\tA A\tA A\tG A\tT A\tG C\tC T""")

dt = 'int,int,int,int,S3,S3,S3,S3,S3,S3,S3,S3,S3,S3,S3,S3'
A = numpy.genfromtxt(a, delimiter='\t', names=True, dtype=dt)

Then we get column 1 from the B array, same as the previous question:

b = StringIO("""1\trs1\t2345
1\trs5\t2346
2\trs6\t2348
4\trs8\t2351
4\trs12\t2360
3\trs2\t2456
2\trs3\t2453
3\trs10\t5672
1\trs9\t78923
5\trs7\t5738
2\trs4\t3546
6\trs11\t6354""")

B = numpy.genfromtxt(b, usecols=[1], dtype='S10')

At this point, as was explained in the previous questions, you can get the rs columns ordered by B, using A[B]. Or if you want all the columns:

A[['family','id','mum','dad']+list(B)]

If you want that printed out as something more closely representing your example output (tab separated columns), you could just do something like this:

cols = ['family','id','mum','dad']+list(B)
result = A[cols]

for line in [cols]+list(result):
  print '\t'.join([str(col) for col in line])

I don't have much experience with numpy, so there may be easier ways to format your output with numpy directly, but that's at least one possible solution.

Sign up to request clarification or add additional context in comments.

1 Comment

My huge problem is that I have to extract the "list(B)" from a 2G file and it's taking DAYS - I can't split the file because I need to extract the "B" values searching on the entire file and all the solutions I've found tell to split the file or the array. I couldn't find a way to do so with pytables either, which seems to be faster. Are there any solutions for this? Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.