Arithmetic on C++ strings

Question

This code really confuses me, it is using some Stanford libraries for the Vector (array) class. Can anyone tell me what is the purpose of int index = line [j] - 'a'; why - 'a'?

void countLetters(string filename)
{
Vector<int> result;

ifstream in2;
in2.open(filename.c_str());
if (in.fail()) Error("Couldn't read '" + filename + "'");

for (int i = 0; i < ALPHABETH_SIZE; i++)
{
    result.add(0);  // Must initialize contents of array
}

string line;
while (true)
{
    getLine(in, line);
    // Check that we got a line
    if (in.fail()) break;

    line = ConvertToLowerCase(line);
    for (int j = 0; j < line.length(); j++)
    {
        int index = line [j] - 'a';
        if (index >= 0 && index < ALPHABETH_SIZE)
        {
            int prevTotal = result[index];
            result[index] = prevTotal +1;
        }
    }
}
}

The purpose of the code:

Takes a filename and prints the number of times each letter of the alphabet appears in that file. Because there are 26 numbers to be printed, CountLetters needs to create a Vector. For example, if the file is:

Presumably it would find how far into the alphabet a letter is, but that doesn't always hold true. — Qaz
– Qaz, Commented Nov 9, 2012 at 5:12
The code as a whole is calculating letter frequencies. result['c' - 'a'] would be the number of times the character 'c' appears in the file. — irrelephant
– irrelephant, Commented Nov 9, 2012 at 5:14

Tony Delroy · Accepted Answer · 2012-11-09 10:29:19Z

2

Characters in a string are encoded using a character set... typically ASCII on hardware common in English language systems. You can see the ASCII table at http://en.wikipedia.org/wiki/ASCII

In ASCII (and most other character sets), the numbers representing letters are contiguous. So, this is the natural way to test whether the character at index j in character-array line is a letter:

line[j] >= 'a' && line[j] <= 'z'

Your program is equivalent to that, in an algebra-kind of sense it subtracts a from both sides (knowing that a is the first character in the character set):

line[j] >= 'a' - `a` && line[j] <= 'z' - `a`

line[j] >= 0 && line[j] <= 'z' - `a`

Replacing "<= z - a" with am equivalent:

line[j] >= 0 && line[j] < ALPHABET_SIZE

where ALPHABET_SIZE is 26. This trades a dependency on knowing z is the last character of your character set for knowing how many characters are in your character set - both are a little fragile, but fine if you know you're dealing with a well-known, stable character set encoding.

A better way to check for a letter is to use the isalpha() predicate: http://www.cplusplus.com/reference/clibrary/cctype/isalpha/

edited Nov 9, 2012 at 10:29

answered Nov 9, 2012 at 5:30

Tony Delroy

107k16 gold badges188 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MSalters Over a year ago

ALPHABET_SIZE is actually a bad idea because it introduces a second assumption: that the alphabet is contiguous. It's that broken assumption which causes the code above to fail on EBCDIC, where 'j'-'i' != 1. In French/ISO-8859-1, similar errors crop up between c and ç

Tony Delroy Over a year ago

@MSalters: the idea that a pair of >= and <= comparisons can identify the alphabet is similar flawed for non-contiguous alphabets - nothing specific to ALPHABET_SIZE about that issue.

Matt · Accepted Answer · 2012-11-09 05:16:46Z

2

"a" is at the beginning of ASII chars.

int index = line [j] - 'a'; if (index >= 0 && index < ALPHABETH_SIZE)

These two line of code is to just if line[j] is a character.

answered Nov 9, 2012 at 5:16

Matt

6,06027 silver badges37 bronze badges

1 Comment

Qaz Over a year ago

But note that ASCII isn't guaranteed.

Collectives™ on Stack Overflow

Arithmetic on C++ strings

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related