python encoder and decoder

Question

I would like to build an an encoder and decoder using text coding.

A string "AAABBBBCDDDDDDDDDDEEDDDD" as input, returning a string "A3B4C1D10E2D4", where each alphabet symbol is followed by its frequency in the string. The decoder reverses the process.

Would like help getting started in python.

So take a stab at it, maybe with a for loop. You're much more likely to get useful answers that way. — Kyle Maxwell
– Kyle Maxwell, Commented Jan 26, 2013 at 17:19
@JohnWard What do you mean by that? Fire a notepad or some other IDE - that's a good start. We won't ( or at least shouldn't ) give you solutions. Try something and then come back to us with that piece of code you'll have. Then we will analyze it and help you ( or not ). Don't be lazy. You might also realize that you don't even need help. — freakish
– freakish, Commented Jan 26, 2013 at 17:32

Community · Accepted Answer · 2017-05-23 11:50:29Z

1

Check this questions not exactly what you want but it can help you try to do that

Determining Letter Frequency Of Cipher Text

edited May 23, 2017 at 11:50

CommunityBot

11 silver badge

answered Jan 26, 2013 at 17:18

seleucia

1,0664 gold badges17 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Abhijit · Accepted Answer · 2013-01-26 17:52:18Z

1

The solution can be approached in different ways, and its pretty easy as a loop based solution, and is left as an exercise for you

As to give you a taste of the power of Python's batteries, I am proposing a solution using groupby

>>> ''.join("{}{}".format(k, sum(1 for e in v))
        for k,v in groupby("AAABBBBCDDDDDDDDDDEEDDDD"))
'A3B4C1D10E2D4'

Salient features of this solution

itertools.groupby groups similar consecutive data as a key, valued pair where the key is the duplicate element and the value is the group of repetition
As the group is a generator, len may not work here but a possible way of calculating length of any non sequence iterable is to use sum
str.join joins an iterable to generate a string with any supplied separator, in this case its an empty string

edited Jan 26, 2013 at 17:52

answered Jan 26, 2013 at 17:46

Abhijit

64k20 gold badges143 silver badges209 bronze badges

1 Comment

jfs Over a year ago

len(list(v)) might be slightly faster in some cases though sum is suitable if v might be infinite.

pyrrrat · Accepted Answer · 2013-01-26 17:36:42Z

0

One possible solution for the cnoder would be to simply iterate over the string and count the character occurences, not very fancy but O(n).

def encode(s):
    last  = s[0]
    count = 0
    for c in s:
        if last != c:
            yield '%s%i' % (last, count)
            last = c
            count = 0
        count += 1
    yield '%s%i' % (last, count)

For the decoder you could use a regular expression which splits the string up nicely for you, no need to write your own parser.

import re

def decode(s):
    for c, n in re.findall(r'(\w)(\d+)', s):
        yield c * int(n)

given your test input

s = 'AAABBBBCDDDDDDDDDDEEDDDD'

encoded = ''.join(encode(s))
print encoded

decoded = ''.join(decode(encoded))
print decoded

results in

A3B4C1D10E2D4
AAABBBBCDDDDDDDDDDEEDDDD

One more note, there's no real reason to use yield here, you could of course also build the strings in the en-/decode functions first, then return.

edited Jan 26, 2013 at 17:36

answered Jan 26, 2013 at 17:22

pyrrrat

3591 silver badge4 bronze badges

7 Comments

freakish Over a year ago

-1: First of all it assumes that input does not contain digits. Secondly: regular expressions? Seriously? And finally: giving solutions to such questions is an antistackoverflow behaviour.

pyrrrat Over a year ago

First of all the input could hardly contain digits to get that kind of output, otherwise you would need some sort of separator between the letter and the letter count. Secondly, yes, regular expressions, they get the job done – use your tools. And finally: thanks for clearing that up, I'll try to do better in the future.

freakish Over a year ago

Yeah, I guess you're right about digits. As for regular expressions: I just find it a bit overkill in this scenario, especially if we exclude digits in input.

pyrrrat Over a year ago

What's the alternative? Going through the string character by character, essentially building your own little parser? I would agree with you if the count could only be single digit, but clearly it can be of arbitrary length so you would have to parse it somehow.

DSM Over a year ago

@pyrrrat: there's a one-line groupby solution (two lines going the other way, although I guess I could pack it into one if I had to).

|

DanielCardin · Accepted Answer · 2013-01-26 17:14:04Z

0

I would start by looking at the python string documentation, specifically find or count and work from there. Though I'm not sure you could really decode anything that you encode if the actual content inside the string matters in that manner.

answered Jan 26, 2013 at 17:14

DanielCardin

5652 gold badges8 silver badges17 bronze badges

Collectives™ on Stack Overflow

python encoder and decoder

4 Answers 4

Comments

1 Comment

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related