I have a file like this:
system
1000
1VEA C 1 9.294 11.244 11.083
1VEA C1 2 9.324 11.375 11.161
1VEA H 3 9.243 11.396 11.232
...
1203VEA H2092601 20.738 16.293 7.837
1203VEA H2192602 20.900 16.225 7.869
1203VEA H2292603 20.822 16.330 7.989
I want to generate a dataframe which include 6 columns. I used following command to
df = pd.read_csv('system.gro', skiprows=[0,1], delim_whitespace=True, header=None)
generate this dataframe. However, when it came to the row started with 1203, columns between H20 and 92601 has no white space and I cannot just use above command to split it. I used to split the line string by specific length like:
f1 = open(fileName, 'r')
for line in f1.readlines():
atomName = line[8:15].strip(' ')
globalIdx = int(line[15:20].strip(' '))
But it takes really long time to deal with the file. Does anyone has any idea about how to deal with this using dataframe?
|?pd.read_csvusepd.read_fwf. I am not sure how the.strip()would work though.