1

I'm very new to programming and Python and I'm trying to convert a DLPOLY HISTORY file to an arc file. What I need to do is extract the lattice vectors (the 3x3 array under the word timestep), the x, y and z coordinates (the three entries on the line underneath each element) and the charge (the fourth entry on the line with the element).

Ideally I'd like to eventually be able to convert files of arbitrary size and frame length.

The two heading lines and first two frames of the DLPOLY HISTORY file that looks like this:

File Title
         0         3         5                  136                 1906
timestep         0         5 0 3            0.000500            0.000000
        3.5853000000        0.0000000000        0.0000000000
       -1.7926500000        3.1049600000        0.0000000000
        0.0000000000        0.0000000000        4.8950000000
Ca               1   40.078000    1.050000    0.000000
     0.000000000         0.000000000         0.000000000
O                2   15.999400   -0.950000    0.000000
     1.792650000        -1.034986100         1.140535000
H                3    1.007940    0.425000    0.000000
     1.792650000        -1.034986100         1.933525000
O                4   15.999400   -0.950000    0.000000
    -1.792650000         1.034987000        -1.140535000
H                5    1.007940    0.425000    0.000000
    -1.792650000         1.034987000        -1.933525000
timestep        10         5 0 3            0.000500            0.005000
         3.5853063513        0.0000000000        0.0000000000
        -1.7926531756        3.1049655004        0.0000000000
         0.0000000000        0.0000000000        4.8950086714
Ca               1   40.078000    1.050000    0.020485
    -0.1758475885E-01    0.1947928245E-04   -0.1192033544E-01
O                2   15.999400   -0.950000    0.051020
     1.841369991        -1.037431082         1.120698646 
H                3    1.007940    0.425000    0.416965
     1.719029690        -1.029327936         2.355541077
O                4   15.999400   -0.950000    0.045979
    -1.795057186         1.034993005        -1.093028694
H                5    1.007940    0.425000    0.373772 
    -1.754959531         1.067269072        -2.320776528

So far the code I have is:

fileList = history_file.readlines()
number_of_frames = int(fileList[1].split()[3])
number_of_lines = int(fileList[1].split()[4])
frame_length = (number_of_lines - 2) / number_of_frames
number_of_atoms = int(fileList[1].split()[2])
lines_per_atom = frame_length / number_of_atoms

for i in range(3, number_of_lines+1, frame_length):

#maths for converting lattice vectors
#print statement to write out converted lattice vectors

    for j in range(i+3, frame_length+1, lines_per_atom):
             atom_type = fileList[j].split()[0]
             atom_x = fileList[j+1].split()[0]
             atom_y = fileList[j+1].split()[1]
             atom_z = fileList[j+1].split()[2]
             charge = fileList[j].split()[3]
             print atom_type, atom_x, atom_y, atom_z, charge

I've can extract and convert the lattice vectors so that's not a problem. However when it comes to the second for loop it only executes once, it think that my range ending statement

frame_length+1 

is incorrect, but if I change it to

 i+3+frame_length+1

I get the following error:

charge = fileList[j].split()[3]
IndexError: list index out of range

Which I think means that I'm going over the end of an array.

I'm sure that I've overlooked something very simple but any help would be greatly appreciated.

I'm also wondering if there is a more efficient way of reading the file because as I understand it readlines reads the entire file into memory and HISTORY files can easily reach several GB in size.

1 Answer 1

1

Ok, we can find the issue doing a fairly simple check using the sample values you provided. If we enter the following code

for i in range(3,1907,136):
    for j in range(i+3,137,2):
        print i,j

we get this:

3 6
3 8
3 10
...
3 132
3 134
3 136

This is the error you're having. The loop only seems to iterate once. However, if we change the code slightly, we see the source of the issue. If we run

for i in range(3,1907,136):
    print "i:", i,
    for j in range(i+3,137,2):
        print "j:", j

We get this:

i: 3 j: 6
j: 8
j: 10
j: 12
...
j: 134
j: 136
i: 139 i: 275 i: 411 i: 547 i: 683 i: 819 i: 955 i: 1091 i: 1227 i: 1363 i: 1499
 i: 1635 i: 1771

So you can see the inner loop (j loop) runs the first time, and once its done, the outer loop (i loop) runs all the way through without letting the inner loop have a go. This is because of the way you have range set on the inner loop. On the first run it evaluates to range(3,137,2) but on the second run it comes out to range(142,137,2) because i starts at 139 on the second run. It is already terminated before it starts.

To get what you want (or what I think what you want) is this for the inner loop:

for j in range(4,frame_length,line_per_atom):
    atom_type = fileList[j+i].split()[0]

This makes j the iterator of lines in each frame past the 4th line

But the thing I haven't figured out is how your code worked at all. I hand calculated the values in your example just as a check.

frame_length = (1906 - 2) / 136 = 14
lines_per_atom = 14 / 5 = 2.8 

A lines_per_atom of 2.8 is illegal, it must be an integer and I have no idea how you aren't getting a TypeError. The calculation for lines_per_atom should be lines_per_atom = (frame_length - 4) / number_of_atoms

Anyways, hope this works!

(Also, try using camel case for variable names in the future instead of underscores. So lines_per_atom becomes linesPerAtom, much easier to type in my opinion)

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks I didn't catch mistake with the lines_per_atom calculation. The strange thing is that with or without the correction it still gives a result of 2 which I was expecting. I wonder why this is?
I also tried your suggested correction to the second loop but I get the following error: charge = fileList[j+i].split()[3] IndexError: list index out of range.
I changed the second loop to for j in range(i+3, frame_length+i-2, lines_per_atom): and it now appears to do what I want it to do. I'll have to play around with different sized input files to see if it still works as I expect it to.
You might also have better results using a while loop on the inner loop, this would allow you to set a condition based on what it finds, rather than a set number of iterations (which can give you IndexError if you're not careful). Just be sure to include an iterator if you do a while loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.