0

I have a file with data like:

   1xxy
   (1gmh)

[white line]
ahdkfkbbmhkkkkkyllllkkjdttyshhaggdtdyrrrutituy
[white line]  
   __________________________________________________
   Intra Chain:
   A 32
   __________________________________________________
   PAIR 1MNS HE 10 NM A ARG 33 OX1 3.22 32
   PAIR 1MNS UR 11 NM A ARG 33 OX2 3.21 12
   PAIR IMNS UK 32 NH A ASN 43 OZ1 5.21 22
   ...
   __________________________________________________

NOW i want to make it like:

   PAIR 1MNS HE 10 NM A ARG 33 OX1 3.22 32
   PAIR 1MNS UR 11 NM A ARG 33 OX2 3.21 12
   PAIR IMNS UK 32 NH A ASN 43 OZ1 5.21 22
   ...

i.e. remove all other characters. i tried using:

inp = open('c:/users/rox/desktop/1UMG.out','r')
for line in inp:
    if not line.strip():      # to remove excess whit lines
       continue
    else:
       z = line.strip().replace('\t',' ')
       if z.startswith('PAIR'):
          print z
inp.close()

but this code is also giving me no output. Can't figure out why z.startswith('PAIR') is not working. But up to the previous line it is going fine.

3
  • Regular Expression check out re Commented Jun 9, 2012 at 3:48
  • 3
    (r' filename.txt').read() actually works? Commented Jun 9, 2012 at 3:52
  • @joel it does fine for me. it is (r'filename.txt').read()..... Commented Jun 10, 2012 at 17:49

2 Answers 2

6

Looks like you are looking only at lines that start with PAIR, so why not something simple like this:

with open('data.txt') as infp:
   for line in infp:
      line = line.strip()
      if line.startswith('PAIR'):
         print(line)

will give:

PAIR 1MNS HE 10 NM A ARG 33 OX1 3.22 32
PAIR 1MNS UR 11 NM A ARG 33 OX2 3.21 12
PAIR IMNS UK 32 NH A ASN 43 OZ1 5.21 22

This output removes the leading 3 spaces, it would be trivial to add them back in if needed.

Note: usingwith will automatically close the file for you when you are done, or an exception is encountered.

Sign up to request clarification or add additional context in comments.

12 Comments

"not working" isn't enough information to work with. What happens? What is the resulting output? Is an exception raised? Etc.
whats the size of the file.If its big it may take a while to process and print the output
@Ovisek: do you see how @Levon opens the file using the with line? That's like infp = open('data.txt') but it automatically closes infp when the block ends. In your code, however, you never open any file, you simply iterate over the filename inp = ('c:/users/rox/desktop/1UMG.out') itself. You're iterating over the characters in the string, not the lines in the file. (You also changed the code so you don't strip z, so if there are spaces before PAIR it will fail, but that might not be a problem in practice.)
@Ovisek: you need to open a file. inp = ('c:/users/rox/desktop/1UMG.out','r') makes inp into a tuple of two strings, 'c:/users/rox/desktop/1UMG.out' and 'r'. So for line in inp then iterates over those two strings, neither of which starts with PAIR. If for some reason you don't want to use the with line, then write inp = open('c:/users/rox/desktop/1UMG.out','r') and add an inp.close() at the end.
As I said, you changed Levon's code so that he strips the line before he uses .startswith() but you don't, so if there are any leading spaces, his code will work and yours won't. Add a 'print line` in your second branch -- i.e. after the else -- to make sure that the lines are actually being read, and you'll probably see spaces at the start. [Incidentally, did you ever try Levon's code as he wrote it? Because it should work.]
|
0

In addition to @Levon's explanation, since the file object supports the iterator protocol, and depending on the size of the file, a list comprehension can be used:

[l for l in open('test.txt') if l.startswith('PAIR')]

4 Comments

@Ovisek: Wrong, it works fine. Learn at least the basics of Python before such statements of yours.
but it's not working with me. if you think itz my basics prob, then please give me a hint.
you need to handle the leading blanks for this to catch the lines starting with 'PAIR', ie l.strip().startswith('PAIR')
.. and you'd have to also take care of stripping the trailing \n (I'm a big fan of list comprehension too by the way)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.