0

I have a dictionary as follows:

s_dict = {'s' : 'ATGCGTGACGTGA'}

I want to change the string stored as the value of the dictionary for key 's' at positions 4, 6, 7 and 10 to h, k, p and r.

pos_change = {'s' : ['4_h', '6_k', '7_p', '10_r']}

The way I can think about it is in a loop:

for key in s_dict:
    for position in pos_change[key]:
        pos = int(position.split("_")[0])
        char = position.split("_")[1]
        l = list(s_dict[key])
        l[pos]= char
        s_dict[key] = "".join(l)

Output:

s_dict = {'s': 'ATGChTkpCGrGA'}

This works fine but my actual s_dict file is about 1.5 Gb. Is there a faster way of replacing a list of characters at specific indices in a string or list?

Thanks!

5
  • 1
    What are you doing to that DNA? Methylation? Commented Aug 21, 2018 at 13:40
  • 1
    pos_change would be better as a dict of dicts (pos_change = {'s' : {4: 'h', 6: 'k', 7: 'p', 10: 'r'}}) Commented Aug 21, 2018 at 13:41
  • 1
    @Chris_Rands Oh, no I just want to replace SNPs with IUPAC characters. I just made an example for the sake of the python question. Commented Aug 21, 2018 at 13:43
  • s_dict['s'] = '%s%s%s' % (s_dict['s'][:pos], char, s_dict['s'][pos+1:]) instead of do list and join Commented Aug 21, 2018 at 13:44
  • I'd use bytearray as it is mutable Commented Aug 21, 2018 at 13:58

2 Answers 2

1

Here is my take on your interesting problem:

s_dict = {'s' : 'ATGCGTGACGTGA'}    
pos_change = {'s' : ['4_h', '6_k', '7_p', '10_r']}

# 1rst change `pos_change` into something more easily usable
pos_change = {k: dict(x.split('_') for x in v) for k, v in pos_change.items()}
print(pos_change)  # {'s': {'4': 'h', '6': 'k', '7': 'p', '10': 'r'}}

# and then... 
for k, v in pos_change.items():
  temp = set(map(int, v))
  s_dict[k] = ''.join([x if i not in temp else pos_change[k][str(i)] for i, x in enumerate(s_dict[k])])

print(s_dict)  # {'s': 'ATGChTkpCGrGA'}
Sign up to request clarification or add additional context in comments.

Comments

1

as an option of solution you can use s_dict['s'] = '%s%s%s' % (s_dict['s'][:pos], char, s_dict['s'][pos+1:]) instead of do list and join

In [1]: s_dict = {'s' : 'ATGCGTGACGTGA' * 10}
   ...: pos_change = {'s' : ['4_h', '6_k', '7_p', '10_r']}
   ...: 
   ...: def list_join():
   ...:     for key in s_dict:
   ...:         for position in pos_change[key]:
   ...:             pos = int(position.split("_")[0])
   ...:             char = position.split("_")[1]
   ...:             l = list(s_dict[key])
   ...:             l[pos]= char
   ...:             s_dict[key] = "".join(l)
   ...: 
   ...: def by_str():
   ...:     for key in s_dict:
   ...:         for position in pos_change[key]:
   ...:             pos = int(position.split("_")[0])
   ...:             char = position.split("_")[1]
   ...:             values = s_dict['s'][:pos], char, s_dict['s'][pos+1:]
   ...:             s_dict['s'] = '%s%s%s' % values
   ...:             

In [2]: %timeit list_join()
11.7 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]: %timeit by_str()
4.29 µs ± 46.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.