Skip to main content
added 2116 characters in body
Source Link
user1430
user1430

The hash shouldn't generallyYou don't need your hash to be reversible. It sounds like you are taking

At runtime, all your lookups to recover an actual string should be done using the approach wherean integer ID value, so you usemight have a macro or function to take a localization identifiercalled (AURA_POISON_CLOUDgetLocalizedString for example) and produce an integer (8921 for example) which refers to an entry inperform such a lookup. Given storage for all the text tablestrings for the active language that looks something like

std::unordered_map<int, std::string> localizedStrings;

that function can be simply

std::string getLocalizedString(int id) {
  return localizedStrings[id];
}

Naturally, you don't actually want to write code like ("CloudgetLocalizedString(892), you want to use a human-readable token like AURA_POISON_CLOUD in place of Poison" for example)892. The simple approach is just define all such tokens

constexpr int AURA_POISON_CLOUD = 892;

Your code and content should always refer to the localization identifierbut that's also a maintenance headache, usually via the hashing macro;so a common approach is to hash the only lookup you needtoken itself into an integer value using a macro and/or constexpr function. For example:

#define LOCSTR(token) hash(#token)

constexpr int hash(const char* string) {
  return string % 0xFFFF; // replace with a real hash function though
}

The Fowler-Noll-Vo hash function is whensimple to implement as a constexpr function in C++ and works well for this purpose.

With this, you havecan call getLocalizedString(LOCSTR(AURA_POISON_CLOUD)) as well as using LOCSTR(AURA_POISON_CLOUD) all over your code to displayrefer to the string. You should not ever need to map from localized string back to localization identifierThe hash value will be computed at compile-time; the orruntime hashwill still see a call to getLocalizedString(892).

You shouldn't even needJust like you don't want to go from hashwrite 892 all over the place in code, you don't want the translation team to localization identifiersee 892 anywhere. You want them to see AURA_POISON_CLOUD. The closestfile you give them should be something that associates the original English strings with those identifiers, as a .csv or spreadsheet or text file. The latter might get is whenlook like this:

AURA_POISON_CLOUD Cloud of Poison
AURA_FLAME_CLOUD Cloud of Flame
UI_CHOOSE_CHAR Choose Your Character
UI_PLAYER_DIED You have died!
...

You give them the source file, they returning it to you havehaving replaced the English text with whatever translations. Your game then has to actually enter your localized strings in some tableload that file, but your table doesn't have to be a tableand when it does, it iterates through every pair of (hashtoken, localized stringtranslation) until runtime.and adds them to localizedStrings:

// pseudo-code
foreach(pair in the file) {
  localizedStrings.insert(hash(pair.token), pair.translation)
}

The tableHaving populated that database of translated text, getLocalizedString will now work fine. At no point will you need to expose the youactual hashed ID write and maintain should beto a table of (localization identifier, localization string)translator or person writing code. You can transform thisThey're all able to use the actual human-readable token.

(hash, localized string) table onceNote this approach still works if you don't have constexpr functions, when you buildjust pay the game's contenthash cost at runtime, so you might write your code a little differently, but it should generally be fine. All)

(Also note that while you havedon't need to do is iterate the table and replaceperform the localization identifier with its hashreverse lookup ever, computedkeeping the information for the reverse lookup around in debug builds can be helpful, but while stepping through the same fashion you'd compute itcode in the debugger, you LOCSTR functionwill see the raw hash results and having the ability to reverse the lookup in the watch window or something can be useful; you don't need this for shipping builds at all, though.)

The hash shouldn't generally need to be reversible. It sounds like you are taking the approach where you use a macro or function to take a localization identifier (AURA_POISON_CLOUD for example) and produce an integer (8921 for example) which refers to an entry in the text table for the active language ("Cloud of Poison" for example).

Your code and content should always refer to the localization identifier, usually via the hashing macro; the only lookup you need is when you have to display the string. You should not ever need to map from localized string back to localization identifier or hash.

You shouldn't even need to go from hash to localization identifier. The closest you might get is when you have to actually enter your localized strings in some table, but your table doesn't have to be a table of (hash, localized string) until runtime.

The table you write and maintain should be a table of (localization identifier, localization string). You can transform this to the actual (hash, localized string) table once, when you build the game's content. All you have to do is iterate the table and replace the localization identifier with its hash, computed in the same fashion you'd compute it in the LOCSTR function.

You don't need your hash to be reversible.

At runtime, all your lookups to recover an actual string should be done using the an integer ID value, so you might have a function called getLocalizedString to perform such a lookup. Given storage for all the strings for the active language that looks something like

std::unordered_map<int, std::string> localizedStrings;

that function can be simply

std::string getLocalizedString(int id) {
  return localizedStrings[id];
}

Naturally, you don't actually want to write code like getLocalizedString(892), you want to use a human-readable token like AURA_POISON_CLOUD in place of 892. The simple approach is just define all such tokens

constexpr int AURA_POISON_CLOUD = 892;

but that's also a maintenance headache, so a common approach is to hash the token itself into an integer value using a macro and/or constexpr function. For example:

#define LOCSTR(token) hash(#token)

constexpr int hash(const char* string) {
  return string % 0xFFFF; // replace with a real hash function though
}

The Fowler-Noll-Vo hash function is simple to implement as a constexpr function in C++ and works well for this purpose.

With this, you can call getLocalizedString(LOCSTR(AURA_POISON_CLOUD)) as well as using LOCSTR(AURA_POISON_CLOUD) all over your code to refer to the string. The hash value will be computed at compile-time; the runtime will still see a call to getLocalizedString(892).

Just like you don't want to write 892 all over the place in code, you don't want the translation team to see 892 anywhere. You want them to see AURA_POISON_CLOUD. The file you give them should be something that associates the original English strings with those identifiers, as a .csv or spreadsheet or text file. The latter might look like this:

AURA_POISON_CLOUD Cloud of Poison
AURA_FLAME_CLOUD Cloud of Flame
UI_CHOOSE_CHAR Choose Your Character
UI_PLAYER_DIED You have died!
...

You give them the source file, they returning it to you having replaced the English text with whatever translations. Your game then has to load that file, and when it does, it iterates through every pair of (token, translation) and adds them to localizedStrings:

// pseudo-code
foreach(pair in the file) {
  localizedStrings.insert(hash(pair.token), pair.translation)
}

Having populated that database of translated text, getLocalizedString will now work fine. At no point will you need to expose the actual hashed ID to a translator or person writing code. They're all able to use the human-readable token.

(Note this approach still works if you don't have constexpr functions, you just pay the hash cost at runtime, so you might write your code a little differently, but it should generally be fine.)

(Also note that while you don't need to perform the reverse lookup ever, keeping the information for the reverse lookup around in debug builds can be helpful, but while stepping through the code in the debugger, you will see the raw hash results and having the ability to reverse the lookup in the watch window or something can be useful; you don't need this for shipping builds at all, though.)

deleted 464 characters in body
Source Link
user1430
user1430

The hash shouldn't generally need to be reversible. It sounds like you are taking the approach where you use a macro or function (call it LOCSTR()) to take a localization identifier (AURA_POISON_CLOUD for example) and produce an integer (8921 for example) which refers to an entry in the text table for the active language ("Cloud of Poison" for example).

Your code and content should always refer to the localization identifier, usually via the hashing macro:

SpecialEffect effect;
effect.name = LOCSTR(AURA_POISON_CLOUD);
...

Themacro; the only lookup you need is when you have to display the string. You should not ever need to map from localized string back to localization identifier or hash.

You shouldn't even need to go from hash to localization identifier. The closest you might get is when you have to actually enter your localized strings in some table, but your table doesn't have to be a table of (hash, localized string) until runtime. 

The table you write and maintain should be a table of (localization identifier, localization string). You can transform this to the actual (hash, localized string) table once, when you build the game's content. All you have to do is iterate the table and replace the localization identifier with its hash, computed in the same fashion you'd compute it in the LOCSTR function.


Note that you can also handle localization identifiers by preprocessing your content and source, doing a simple linear assignment of ID to underlying index in the table. This means you have a compact array of strings instead of a hash map, or whatever, and faster lookup. But it's harder to maintain over time because the indices can shift between builds.

You may also be interested in reading an approach to localization described on the Our Machinery blog.

The hash shouldn't generally need to be reversible. It sounds like you are taking the approach where you use a macro or function (call it LOCSTR()) to take a localization identifier (AURA_POISON_CLOUD for example) and produce an integer (8921 for example) which refers to an entry in the text table for the active language ("Cloud of Poison" for example).

Your code and content should always refer to the localization identifier, usually via the hashing macro:

SpecialEffect effect;
effect.name = LOCSTR(AURA_POISON_CLOUD);
...

The only lookup you need is when you have to display the string. You should not ever need to map from localized string back to localization identifier or hash.

You shouldn't even need to go from hash to localization identifier. The closest you might get is when you have to actually enter your localized strings in some table, but your table doesn't have to be a table of (hash, localized string) until runtime. The table you write and maintain should be a table of (localization identifier, localization string). You can transform this to the actual (hash, localized string) table once, when you build the game's content. All you have to do is iterate the table and replace the localization identifier with its hash, computed in the same fashion you'd compute it in the LOCSTR function.


Note that you can also handle localization identifiers by preprocessing your content and source, doing a simple linear assignment of ID to underlying index in the table. This means you have a compact array of strings instead of a hash map, or whatever, and faster lookup. But it's harder to maintain over time because the indices can shift between builds.

You may also be interested in reading an approach to localization described on the Our Machinery blog.

The hash shouldn't generally need to be reversible. It sounds like you are taking the approach where you use a macro or function to take a localization identifier (AURA_POISON_CLOUD for example) and produce an integer (8921 for example) which refers to an entry in the text table for the active language ("Cloud of Poison" for example).

Your code and content should always refer to the localization identifier, usually via the hashing macro; the only lookup you need is when you have to display the string. You should not ever need to map from localized string back to localization identifier or hash.

You shouldn't even need to go from hash to localization identifier. The closest you might get is when you have to actually enter your localized strings in some table, but your table doesn't have to be a table of (hash, localized string) until runtime. 

The table you write and maintain should be a table of (localization identifier, localization string). You can transform this to the actual (hash, localized string) table once, when you build the game's content. All you have to do is iterate the table and replace the localization identifier with its hash, computed in the same fashion you'd compute it in the LOCSTR function.


You may also be interested in reading an approach to localization described on the Our Machinery blog.

Source Link
user1430
user1430

The hash shouldn't generally need to be reversible. It sounds like you are taking the approach where you use a macro or function (call it LOCSTR()) to take a localization identifier (AURA_POISON_CLOUD for example) and produce an integer (8921 for example) which refers to an entry in the text table for the active language ("Cloud of Poison" for example).

Your code and content should always refer to the localization identifier, usually via the hashing macro:

SpecialEffect effect;
effect.name = LOCSTR(AURA_POISON_CLOUD);
...

The only lookup you need is when you have to display the string. You should not ever need to map from localized string back to localization identifier or hash.

You shouldn't even need to go from hash to localization identifier. The closest you might get is when you have to actually enter your localized strings in some table, but your table doesn't have to be a table of (hash, localized string) until runtime. The table you write and maintain should be a table of (localization identifier, localization string). You can transform this to the actual (hash, localized string) table once, when you build the game's content. All you have to do is iterate the table and replace the localization identifier with its hash, computed in the same fashion you'd compute it in the LOCSTR function.


Note that you can also handle localization identifiers by preprocessing your content and source, doing a simple linear assignment of ID to underlying index in the table. This means you have a compact array of strings instead of a hash map, or whatever, and faster lookup. But it's harder to maintain over time because the indices can shift between builds.

You may also be interested in reading an approach to localization described on the Our Machinery blog.