1

I have a data file (trajectory file) which is not numerically sorted. The data file consists of texts and numbers repeatedly like the below. As you can see, the first 4 rows are just information, and the real numbers being sorted start with fifth row. Then again, another four rows are just information, then the number starts with the fifth row. Those are repeatedly hundred blocks. I would like to sort them numerically as the first column.

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
4959 8 10.1 20.1 41.1
5029 8 13.1 43.1 5.3
....
ITEM: TIMESTEP
100
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
1259 8 10.1 20.1 41.1
6169 8 13.1 43.1 5.3
....
ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
3523 8 10.1 20.1 41.1
9119 8 13.1 43.1 5.3
....

I tried to make a python script. My idea is putting the each number block between 'ITEM: ATOMES id type x y z' and ITEM: NUMBER of ATOMS into list, then sort them in the list and print them. I have put them into list but the each element like (e.g., 4959 8 10.1 20.1 41.1) is just one string. How can I sort as the first column of the string in the list?

I tried as the following. Would you give me some advice?

f_in=open('aa', 'r')

def SORT(List):

        print 'ITEM: TIMESTEP'
        print 'Num of Trajectory'
        print 'ITEM: NUMBER OF ATOMS'
        print 'ATOMS'
        print 'ITEM: BOX BOUNDS pp pp pp'
        print '\n\n'
        print 'ITEM: ATOMS id type x y z'

        for p in List:
                print p

LIST=[]

a = 1

for line in f_in:

        sp = line.split()

        if(len(sp) != 5):
                continue
        else:
                if(a < 5085):
                        LIST.append(line)
                        a = a + 1
                elif(a == 5085):
                        LIST.append(line)
                        LIST = map(lambda s: s.strip(), LIST)
                        SORT(LIST)
                        a = 1
5
  • Can you re-format your code & examples -- they are hard to read as they are. Commented Feb 10, 2016 at 16:47
  • After you split the line into sp and check its size, you never use sp again; why? Commented Feb 10, 2016 at 16:48
  • The reason is that sp is list. Therefore, if I append sp into the empty list (LIST), the format looks like lists in LIST. The coordinates starts with every fifth columns in this example and sp is only used to figure out what I want to put them into the LIST. As you can see, others are not exactly len(sp) == 5. So, len(sp)==5 gives good statement determining the only coordination to be sorted. The coordination each block ends with 5085 row from the first row I read. Commented Feb 10, 2016 at 16:57
  • What is your expected output? Do you want the file to be updated with just the rows inside each block sorted? Commented Feb 10, 2016 at 17:42
  • Yes, that is what I want to sort this file. From first four rows are just printed but all rows from the fifth rows to the end rows before starting new text (ITEM: TIMESTEP) need to be sorted numerically by first column. Commented Feb 10, 2016 at 17:44

3 Answers 3

1

The following script will read in your file and sort the rows within each block:

from itertools import groupby

with open('input.txt') as f_input, open('output.txt', 'w') as f_output:
    for k, g in groupby(f_input, lambda x: x != 'ITEM: TIMESTEP\n'):
        if k:
            entries = [line.strip() for line in g]
            block_header = ['ITEM: TIMESTEP'] + entries[:3]
            entries = sorted([line.split() for line in entries[3:]], key=lambda x: int(x[0]))
            f_output.write('\n'.join(block_header) + '\n')

            for row in entries:
                f_output.write(' '.join(row) + '\n')

It makes use of Python's groupby function to read in the file in blocks based on ITEM: TIMESTEP. It then strips the new lines off each row, and extracts just the rows with values. It then splits each of these rows based on spaces and sorts these rows by converting the first entry to an integer.

It then writes each of these rows to the output file, giving each the same block header.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I have question of block_header list. Before I ask, I am very sorry for posting wrong example. Actually, every second element of block_header is changing. For example, the first is 0, and next is 100, and the next is 200, and so on. Can I still use the block_header list keeping the second element with 0? Or, is the block_header no longer available if the second element is keep changing?
And the second question is that there actually are three more lines between "ITEM: NUMBER OF ATOMS" and "ITEM: ATOMS id type x y z". Those lines are also changing every block. How can I use block_header list more flexibly? The example I wrote works fine with your code. Thank you.
Try now, I have updated it to use the existing block headers.
0

Once you have your list, you can sort is using sort's key parameter.

numberList.sort(key=lambda line: int(line.split()[0]))

This tells sort to use the first item in the line converted to an integer as the sort key.

However, this wouldn't work if any of your lines that start with text are within the list. The conversion to int would fail. You will have to filter those out first.

Comments

0

You could also try:

import re
f_in=open('aa', 'r')

def SORT(List):

        print 'ITEM: TIMESTEP'
        print 'Num of Trajectory'
        print 'ITEM: NUMBER OF ATOMS'
        print 'ATOMS'
        print 'ITEM: BOX BOUNDS pp pp pp'
        print '\n\n'
        print 'ITEM: ATOMS id type x y z'

        for p in List:
                print p

result = [] # real numbers list

# read whole content into a list
lines= f_in.readlines()
# enumerate each line and find only the numers
# append each found item into result list
for line in lines:
    m = re.findall('^[0-9\s\.].+', line.strip('\n'))
    if m: result.append(m[0])
    else: continue
# split result list into chunks (5085)
for i in xrange(0, len(result), 5085):
    LIST = result[i:i+5085]
    SORT(LIST)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.