I'm very new to Python. I've searched extensively for a solution to my problem, but I'm hitting dead ends left and right.
I've produced a series of arrays using following code:
fh = open(short_seq, 'r')
line_counter = 0
pos = [0]
array = [0.0 for x in range(101)]
for line in fh:
line_counter += 1.0
for i in line:
score = ord(i) - 33.0
array[pos] += score
pos += 1
After printing inside the loop I get a large series of arrays.
[1,2,3,4.....]
[2,3,4,5,6.....]
[3,4,5,6,7,8.....100]
...
I'd like to use NumPy to run stats on each column, in the specific alignment that they are printed out in, but once I'm outside of the loop I can only call the sum of entire loop. I tried np.concatenate, but that still left me with the sum of the arrays. If I use NumPy in the loop then I can only run stats on each column, one iteration at a time, rather than the whole series. My next idea was to ad each iteration into a two-dimensional matrix, but I couldn't figure how to keep the alignment.
Any help would be greatly appreciated.
EDIT: Here is a sample of my data (each of the four strings are right underneath on another in a text editor). I'm trying to convert a few thousand lines of ascii to numerical values. Each line has to be in an array 100 characters long, and then I need to run stats on each column.
CCCFFFFFHHHHHIJJJJJJIJJJJJJJJIJJJIJJJJJJJIJJIJJGIIIHIIIFGIGFHFGIIIHIHHGEHHFDFFFFFDDDDDBDDDDDDDDEDEEDD CCCFFFFFHHHHHJJJJJJJJJJIIIJJIGJJJJJJJJJJIJJJJJIJJJJJJIJIJJIJJIJJIJJHGHHHHFFCEFFFEEDAEEEFEEDDDB:ADDDD: CCCFFFFFHHHHHJIJJJIJJJIJJIJJIIJIIJJJJJJJJJJJJJIIJJJJJJJJJGHHHHFFFFFFEEEEEEEDDDDDEDDDDDDDDDDDDDDDDD>9< BCCFFFDFHHHHHJJJJJJJJJJJIIJJJI@HGIIIJJJJJIJJIJIIJJJJJJJJJHHHHHHFFFDDDDDDDDDDDDDDDD?BDDDD@CDDDDDBDDDDD
numpy.sum(array, axis=0).