1

I have a text file from which I want to extract a certain string based on a variable site which represents the location of some character. I want to extract 20 characters before and after the location of site. My code works perfectly well if the value of site is over 20. But if it has less than 20 characters before site, it doesn't return anything.

For example, I have a string here where site=5 which in this case is K.

MSGRGKGGKGLGKGGAKRHRKVLRDXYZX

Now I am trying to extract 20 characters before and after the character K. Below is my code;

data=myfile.read()    
str1 = data[site:site+1+20]
temp = data[site-20:site]
final_sequence = temp+str1
print final_sequence

This gives me an output of KGGKGLGKGGAKRHRKVLRDX. Since it couldn't find 20 characters before K, it didn't print the chaarcters before K.

The correct should have been MSGRGKGGKGLGKGGAKRHRKVLRDX.

Which brings me to my question. How can I modify my code to print all characters before K if there are less than 20 characters downstream of the value of K? Thank you.

3 Answers 3

2

The problem is that since site-20 is negative Python considers it an index relative to the end of the sequence, and so the slice is empty (because the first index is posterior to the last index). Just make sure you never go below 0.

data=myfile.read()    
str1 = data[site:site+1+20]
temp = data[max(site-20, 0):site]
final_sequence = temp+str1
print final_sequence

Or shorter

data=myfile.read()    
final_sequence = data[max(site-20, 0):site+1+20]
print final_sequence

Note that you do not need to use min(site+1+20, len(data)) for the upper bound because Python automatically clips slice indices beyond the end of the sequence to the sequence length.

Sign up to request clarification or add additional context in comments.

1 Comment

Noted. Thanks very much.
0

You have to check upper bound and lower bound of the length of your file. Because negative value (for before location) must be checked and same rule for the upper bound (for after location).

Comments

0

The first answer is also right using MAX. Following example is non pythonic way using condition.

data = "MSGRGKGGKGLGKGGAKRHRKVLRDXYZX"

str1 = data[site:site+1+20]
if site <= 20:
     temp = data[0:site]
elif site > 20:
    temp = data[site%20:site]

print str1 + temp

Write proper unittest case with different data to verify your logic.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.