1

I would like to build an an encoder and decoder using text coding.

A string "AAABBBBCDDDDDDDDDDEEDDDD" as input, returning a string "A3B4C1D10E2D4", where each alphabet symbol is followed by its frequency in the string. The decoder reverses the process.

Would like help getting started in python.

4
  • have you tried anything..any single line of code?? Commented Jan 26, 2013 at 17:09
  • 3
    What have you tried? Commented Jan 26, 2013 at 17:09
  • So take a stab at it, maybe with a for loop. You're much more likely to get useful answers that way. Commented Jan 26, 2013 at 17:19
  • @JohnWard What do you mean by that? Fire a notepad or some other IDE - that's a good start. We won't ( or at least shouldn't ) give you solutions. Try something and then come back to us with that piece of code you'll have. Then we will analyze it and help you ( or not ). Don't be lazy. You might also realize that you don't even need help. Commented Jan 26, 2013 at 17:32

4 Answers 4

1

Check this questions not exactly what you want but it can help you try to do that

Determining Letter Frequency Of Cipher Text

Sign up to request clarification or add additional context in comments.

Comments

1

The solution can be approached in different ways, and its pretty easy as a loop based solution, and is left as an exercise for you

As to give you a taste of the power of Python's batteries, I am proposing a solution using groupby

>>> ''.join("{}{}".format(k, sum(1 for e in v))
        for k,v in groupby("AAABBBBCDDDDDDDDDDEEDDDD"))
'A3B4C1D10E2D4'

Salient features of this solution

  1. itertools.groupby groups similar consecutive data as a key, valued pair where the key is the duplicate element and the value is the group of repetition
  2. As the group is a generator, len may not work here but a possible way of calculating length of any non sequence iterable is to use sum
  3. str.join joins an iterable to generate a string with any supplied separator, in this case its an empty string

1 Comment

len(list(v)) might be slightly faster in some cases though sum is suitable if v might be infinite.
0

One possible solution for the cnoder would be to simply iterate over the string and count the character occurences, not very fancy but O(n).

def encode(s):
    last  = s[0]
    count = 0
    for c in s:
        if last != c:
            yield '%s%i' % (last, count)
            last = c
            count = 0
        count += 1
    yield '%s%i' % (last, count)

For the decoder you could use a regular expression which splits the string up nicely for you, no need to write your own parser.

import re

def decode(s):
    for c, n in re.findall(r'(\w)(\d+)', s):
        yield c * int(n)

given your test input

s = 'AAABBBBCDDDDDDDDDDEEDDDD'

encoded = ''.join(encode(s))
print encoded

decoded = ''.join(decode(encoded))
print decoded

results in

A3B4C1D10E2D4
AAABBBBCDDDDDDDDDDEEDDDD

One more note, there's no real reason to use yield here, you could of course also build the strings in the en-/decode functions first, then return.

7 Comments

-1: First of all it assumes that input does not contain digits. Secondly: regular expressions? Seriously? And finally: giving solutions to such questions is an antistackoverflow behaviour.
First of all the input could hardly contain digits to get that kind of output, otherwise you would need some sort of separator between the letter and the letter count. Secondly, yes, regular expressions, they get the job done – use your tools. And finally: thanks for clearing that up, I'll try to do better in the future.
Yeah, I guess you're right about digits. As for regular expressions: I just find it a bit overkill in this scenario, especially if we exclude digits in input.
What's the alternative? Going through the string character by character, essentially building your own little parser? I would agree with you if the count could only be single digit, but clearly it can be of arbitrary length so you would have to parse it somehow.
@pyrrrat: there's a one-line groupby solution (two lines going the other way, although I guess I could pack it into one if I had to).
|
0

I would start by looking at the python string documentation, specifically find or count and work from there. Though I'm not sure you could really decode anything that you encode if the actual content inside the string matters in that manner.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.