2

I am learning python, and trying to use regex. I am used to do that with shell script (awk, grp and sed), but need to do that with python.

in my file, I have lines like:

species,subl,cmp=    1    7    1    s1,torque=-0.65079E-11-0.59320E-15
species,subl,cmp=    1    6    1    s1,torque= 0.30782E-10 0.65641E-14

in shell script, i can do this with

var_s1=`grep "species,subl,cmp=    $3    $4    $5" $tfile |sed -r 's/.*(.{11}).{12}/\1/'`

but, trying to do this with python code:

#!/usr/bin/python
import sys,math,re

infile=sys.argv[1]; oufile=sys.argv[2]
ifile=open(infile, 'r'); ofile=open(oufile, 'w')
pattern=r'species,subl,cmp=\s{4}(.*)\s{4}(.*)\s{4}(.*)\s{3}s1,torque=(.*)\s{1}(.*)'

ssc1=[];ssc2=[];ssc3=[]; s1=[]; t=[]
for line in ifile:
  match = re.search(pattern, line)
  if match:
    ssc1.   append(int(match.group(1)))
    ssc2.   append(int(match.group(1)))
    ssc3.   append(int(match.group(1)))
    s1.     append(float(match.group(1)))
    t.      append(float(match.group(1)))
#    ofile.write('%g %g %g' %(ssc1, s1,t))
#print('%5.3e %5.3e' s1,t)
for i in range(len(t)):
  print('%g %g %g' % (ssc1[i], s1[i], t[i]))

ifile.close(); ofile.close()

gives all result as 1:

$ python triel2.py out-Dy-eos2 tres
1 1 1
1 1 1

Kindly show me where I am going wrong. I am following this book. But as a beginner, kindly,suggest me better approach as well.

1 Answer 1

1

Change this:

ssc1.   append(int(match.group(1)))
ssc2.   append(int(match.group(1)))
ssc3.   append(int(match.group(1)))
s1.     append(float(match.group(1)))
t.      append(float(match.group(1)))

to this:

ssc1.   append(int(match.group(1)))
ssc2.   append(int(match.group(2)))
ssc3.   append(int(match.group(3)))
s1.     append(float(match.group(4)))
t.      append(float(match.group(5)))

It looks like you might have a problem with the text after "torque". In the first line of your example from the file, there is no space between the numbers. You could split those two numbers based on field width rather than the separator. One way to do this is to replace this part of the regular expression:

torque=(.*)\s{1}(.*)

with this:

torque=(.{12})(.{12})

That assumes that the numbers after "torque" each use a field width of 12 characters.

An alternative would be to match everything after "torque" with "(.*)", and then use python string slicing to pull apart the matched text.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, can you please show me how I can split those field using field width?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.