Lokalized facilitates natural-sounding software translations on the JVM.
It is both a file format…
…and a library that operates on it.
String translation = strings.;;
Design Goals
- Complex translation rules can be expressed in a configuration file, not code
- First-class support for gender and plural (cardinal, ordinal, range) language forms per latest CLDR specifications
- Provide a simple expression language to handle traditionally difficult edge cases
- Support multiple platforms natively
- Immutability/thread-safety
- No dependencies
Design Non-Goals
- Support for date/time, number, percentage, and currency formatting/parsing
- Support for collation
- Support for Java 7 and below
Roadmap
- Static analysis tool to autogenerate/sync localized strings files
- Additional Ports (JavaScript, Python, Android, Go, …)
- Webapp for translators
License
Maven Installation
com.lokalizedlokalized-java1.0.3
Direct Download
If you don’t use Maven, you can drop lokalized-java-1.0.3.jar directly into your project. No other dependencies are required.
Why Lokalized?
- As a developer, it is unrealistic to embed per-locale translation rules in code for every text string
- As a translator, sufficient context and the power of an expression language are required to provide the best translations possible
- As a manager, it is preferable to have a single translation specification that works on the backend, web frontend, and native mobile apps
Perhaps most importantly, the Lokalized placeholder system and expression language allow you to support edge cases that are critical to natural-sounding translations - this can be difficult to achieve using traditional solutions.
Getting Started
We’ll start with hands-on examples to illustrate key features.
1. Create Localized Strings Files
Filenames must conform to the IETF BCP 47 language tag format.
Here is a generic English (en
) localized strings file which handles two localizations:
Here is a British English (en-GB
) localized strings file:
Lokalized performs locale matching and falls back to less-specific locales as appropriate, so there is no need to duplicate all the en
translations in en-GB
- it is sufficient to specify only the dialect-specific differences.
2. Create a Strings Instance
// Your "native" fallback strings file, used in case no specific locale match is found.// ISO 639 alpha-2 or alpha-3 language codefinal String FALLBACK_LANGUAGE_CODE = "en";// Creates a Strings instance which loads localized strings files from the given directory.// Normally you'll only need a single shared instance to support your entire application,// even for multitenant/concurrent usage, e.g. a Servlet containerStrings strings = new DefaultStrings.Builder(FALLBACK_LANGUAGE_CODE,() -> LocalizedStringLoader.).;
You may also provide the builder with a locale-supplying lambda, which is useful for environments like webapps where each request can have a different locale.
// "Smart" locale selection which queries the current web request for locale data.// MyWebContext is a class you might write yourself, perhaps using a ThreadLocal internallyStrings webappStrings = new DefaultStrings.Builder(FALLBACK_LANGUAGE_CODE,() -> LocalizedStringLoader.)..;
3. Ask Strings Instance For Translations
// Lokalized knows how to map numbers to plural cardinalities per locale.// That is, it understands that 3 means CARDINALITY_OTHER ("books") in EnglishString translation = strings.;;// 1 means CARDINALITY_ONE ("book") in Englishtranslation = strings.;;// A special alternative rule is applied when bookCount == 0translation = strings.;;// Here we force British English.// Note that providing an explicit locale is an uncommon use case -// standard practice is to specify a localeSupplier when constructing your// Strings instance and Lokalized will use it to pick the appropriate locale, e.g.// the locale specified by the current web request's Accept-Language headertranslation = strings.;// We have an exact match for this key in the en-GB file, so that translation is applied.// If none were found, we would fall back to "en" and try there instead;
A More Complex Example
Lokalized’s strength is handling phrases that must be rewritten in different ways according to language rules. Suppose we introduce gender alongside plural forms. In English, a noun’s gender usually does not alter other components of a phrase. But in Spanish it does.
This English statement has 4 variants:
He was one of the X best baseball players.
She was one of the X best baseball players.
He was the best baseball player.
She was the best baseball player.
In Spanish, we have the same number of variants (in a language like Russian or Arabic there would be more!)
But notice how the statements must change to match gender - uno
becomes una
, jugadores
becomes jugadoras
, etc.
Fue uno de los X mejores jugadores de béisbol.
Fue una de las X mejores jugadoras de béisbol.
Él era el mejor jugador de béisbol.
Ella era la mejor jugadora de béisbol.
English Translation File
English is a little simpler than Spanish because gender only affects the He
or She
component of the sentence.
Spanish Translation File
Note that we define our own placeholders in translation
and drive them off of the heOrShe
value to support gender-based word changes.
The Rules, Exercised
Notice that we keep the gender and plural logic out of our code entirely and leave rule processing to the translation configuration.
// "Normal" translationtranslation = strings.;;// Alternative expression triggeredtranslation = strings.;;// Let's try Spanishtranslation = strings.;// Note that the correct feminine forms were applied;
Recursive Alternatives
You can exploit the recursive nature of alternative expressions to reduce logic duplication. Here, we define a toplevel alternative for groupSize <= 1
which itself has alternatives for MASCULINE
and FEMININE
cases. This is equivalent to the alternative rules defined above but might be a more “comfortable” way to express behavior for some.
Note that this is just a snippet to illustrate functionality - the other portion of this localized string has been elided for brevity.
Cardinality Ranges
When expressing a range of values (1-3 meters
, 2.5-3.5 hours
), the cardinality of the range is determined by applying per-language rules to its start and end cardinalities.
In English we don’t think about this - all ranges are of the form CARDINALITY_OTHER
- but many other languages have range-specific forms.
French Translation File
French ranges can be either CARDINALITY_ONE
or CARDINALITY_OTHER
.
English Translation File
All English range forms evaluate to CARDINALITY_OTHER
so the file can be kept simple.
Cardinality Ranges, Exercised
// French CARDINALITY_OTHER caseString translation = strings.;;// French CARDINALITY_ONE casetranslation = strings.;;
Ordinal Forms
Many languages have special forms called ordinals to express a “ranking” in a sequence of numbers. For example, in English we might say
Take the 1st left after the intersection
She is my 2nd cousin
I finished the race in 3rd place
Let’s look at an example related to birthdays.
English Translation File
English has 4 ordinals.
Spanish Translation File
Spanish doesn’t have ordinals, so we can disregard them. But we do have a few special cases - a first birthday and a quinceañera for girls.
Ordinals, Exercised
translation = strings.;// The ORDINALITY_OTHER rule is applied for 18 in English;translation = strings.;// The ORDINALITY_ONE rule is applied to any of the "one" numbers (1, 11, 21, ...) in English;translation = strings.;// Normal case;translation = strings.;// Special case for first birthday;translation = strings.;// Special case for a girl's 15th birthday;
Language Forms
Gender
Gender rules vary across languages, but the general meaning is the same.
Lokalized supports these values:
Lokalized provides a Gender
type which enumerates supported genders.
Plural Cardinality
For example: 1 book, 2 books, ...
Plural rules vary widely across languages.
Lokalized supports these values according to CLDR rules:
Values do not necessarily map exactly to the named number, e.g. in some languages CARDINALITY_ONE
might mean any number ending in 1
, not just 1
. Most languages only support a few plural forms, some have none at all (represented by CARDINALITY_OTHER
in those cases).
Japanese
CARDINALITY_OTHER
: Matches everything (this language has no plural form)
English
CARDINALITY_ONE
: Matches 1 (e.g.1 dollar
)CARDINALITY_OTHER
: Everything else (e.g.256 dollars
)
Russian
CARDINALITY_ONE
: Matches 1, 21, 31, … (e.g.1 рубль
or51 рубль
)CARDINALITY_FEW
: Matches 2-4, 22-24, 32-34, … (e.g.2 рубля
or53 рубля
)CARDINALITY_MANY
: Matches 0, 5-20, 25-30, 45-50, … (e.g.5 рублей
or17 рублей
)CARDINALITY_OTHER
: Everything else (e.g.0,3 руб
,1,5 руб
)
Lokalized provides a Cardinality
type which encapsulates cardinal functionality.
You may programmatically determine cardinality using Cardinality#forNumber(Number number, Locale locale)
and Cardinality#forNumber(Number number, Integer visibleDecimalPlaces, Locale locale)
as shown below.
It is important to note that the number of visible decimal places can be important for some languages when performing cardinality evaluation. For example, in English, 1
matches CARDINALITY_ONE
but 1.0
matches CARDINALITY_OTHER
. Even though the numbers’ true values are identical, you would say 1 inch
and 1.0 inches
and therefore must take visible decimals into account.
// Basic case - a primitive number, no decimalsCardinality cardinality = Cardinality.;;// In the absence of an explicit number of visible decimals,// 1.0 evaluates to Cardinality.ONE since primitive 1 == primitive 1.0cardinality = Cardinality.;;// With 1 visible decimal specified ("1.0"), we evaluate to Cardinality.OTHERcardinality = Cardinality.;;// Let's try BigDecimal instead of a primitive...cardinality = Cardinality.;;// Using BigDecimal obviates the need to specify visible decimals// since they can be encoded directly in the number.// We evaluate to Cardinality.OTHER, as expectedcardinality = Cardinality.;;
Plural Cardinality Ranges
For example: 0-1 hours, 1-2 hours, ...
The plural form of the range is determined by examining the cardinality of its start and end components.
English
CARDINALITY_ONE
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.1–2 days
)CARDINALITY_OTHER
-CARDINALITY_ONE
⇒CARDINALITY_OTHER
(e.g.0–1 days
)CARDINALITY_OTHER
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.0–2 days
)
French
CARDINALITY_ONE
-CARDINALITY_ONE
⇒CARDINALITY_ONE
(e.g.0–1 jour
)CARDINALITY_ONE
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.0–2 jours
)CARDINALITY_OTHER
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.2–100 jours
)
Latvian
CARDINALITY_ZERO
-CARDINALITY_ZERO
⇒CARDINALITY_OTHER
(e.g.0–10 diennaktis
)CARDINALITY_ZERO
-CARDINALITY_ONE
⇒CARDINALITY_ONE
(e.g.0–1 diennakts
)CARDINALITY_ZERO
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.0–2 diennaktis
)CARDINALITY_ONE
-CARDINALITY_ZERO
⇒CARDINALITY_OTHER
(e.g.0,1–10 diennaktis
)CARDINALITY_ONE
-CARDINALITY_ONE
⇒CARDINALITY_ONE
(e.g.0,1–1 diennakts
)CARDINALITY_ONE
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.0,1–2 diennaktis
)CARDINALITY_OTHER
-CARDINALITY_ZERO
⇒CARDINALITY_OTHER
(e.g.0,2–10 diennaktis
)CARDINALITY_OTHER
-CARDINALITY_ONE
⇒CARDINALITY_ONE
(e.g.0,2–1 diennakts
)CARDINALITY_OTHER
-CARDINALITY_OTHER
⇒CARDINALITY_OTHER
(e.g.0,2–2 diennaktis
)
You may programmatically determine a range’s cardinality using Cardinality#forRange(Cardinality start, Cardinality end, Locale locale)
as shown below.
// Latvian has a number of interesting range rules.// ZERO-ZERO -> OTHERCardinality cardinality = Cardinality.;;// ZERO-ONE -> ONEcardinality = Cardinality.;;
Ordinals
For example: 1st, 2nd, 3rd, 4th, ...
Similar to plural cardinality, ordinal rules very widely across languages.
Lokalized supports these values according to CLDR rules:
Again, like cardinal values, ordinals do not necessarily map to the named number. For example, ORDINALITY_ONE
might apply to any number that ends in 1
.
Spanish
ORDINALITY_OTHER
: Matches everything (this language has no ordinal form)
English
ORDINALITY_ONE
: Matches 1, 21, 31, … (e.g.1st prize
)ORDINALITY_TWO
: Matches 2, 22, 32, … (e.g.22nd prize
)ORDINALITY_FEW
: Matches 3, 23, 33, … (e.g.33rd prize
)ORDINALITY_OTHER
: Everything else (e.g.12th prize
)
Italian
ORDINALITY_MANY
: Matches 8, 11, 80, 800 (e.g.Prendi l'8° a destra
)ORDINALITY_OTHER
: Everything else (e.g.Prendi la 7° a destra
)
Lokalized provides an Ordinality
type which encapsulates ordinal functionality.
You may programmatically determine ordinality using Ordinality#forNumber(Number number, Locale locale)
as shown below.
// e.g. "1st"Ordinality ordinality = Ordinality.;;// e.g. "2nd"ordinality = Ordinality.;;// e.g. "3rd"ordinality = Ordinality.;;// e.g. "21st"ordinality = Ordinality.;;// e.g. "27th"ordinality = Ordinality.;;
Localized Strings File Format
Structure
- Each strings file must be UTF-8 encoded and named according to the appropriate IETF BCP 47 language tag, such as
en
orzh-TW
- The file must contain a single toplevel JSON object
- The object’s keys are the translation keys, e.g.
"I read {{bookCount}} books."
- The value for a translation key can be a string (simple cases) or an object (complex cases)
With formalities out of the way, let’s return to our example en-GB
strings file, which contains a single translation. We can use the string form shorthand to concisely express our intent:
This is equivalent to the more verbose object form, which we don’t need in this situation.
In addition to translation
, each object form supports 3 additional keys: commentary
, placeholders
, and alternatives
.
All 4 are optional, with the stipulation that you must provide either a translation
or at least one alternatives
value.
Commentary
This free-form field is used to supply context for the translator, such as how and where the phrase is used in the application. It might also include documentation about the application-supplied placeholder values (names and types) so it’s clear what data is available to perform the translation.
Placeholders
A placeholder is any translation value enclosed in a pair of “mustaches” - {{PLACEHOLDER_NAME_HERE}}
.
You are free to add as many as you like to support your translation.
Placeholder values are initially specified by application code - they are the context that is passed in at string evaluation time.
Your translation file may override passed-in placeholders if desired, but that is an uncommon use case.
In the below example of an en
strings file, the application code provides the bookCount
value and the translation file introduces a books
value to aid final translation.
Each placeholders
object key is the name of the placeholder - books
, in this example - and the value is an object with value
and translations
.
value
is the placeholder value to examine. It may be aNumber
,Cardinality
,Ordinality
, orGender
type. Lokalized will convertNumber
instances to the appropriateCardinality
orOrdinality
according the language’s rulestranslations
is a set of language rules against which to evaluatevalue
and provide a translation
Here, the value of bookCount
is evaluated against the specified cardinality rules and the result is placed into books
. For example, if application code passes in 1
for bookCount
, this matches CARDINALITY_ONE
and book
is the value of the books
placeholder. If application code passes in a different value, CARDINALITY_OTHER
is matched and books
is used.
Supported values for translations
are Cardinality
, Ordinality
, and Gender
types.
You may not mix language forms in the same translations
object. For example, it is illegal to specify both CARDINALITY_ONE
and GENDER_MASCULINE
.
The placeholder structure is slightly different for cardinality ranges. A range
property is introduced and requires both a start
and end
value.
Here, the cardinalities of minHours
and maxHours
are evaluated to determine the overall cardinality of the range, which is used to select the appropriate value in translations
.
You are prohibited from supplying both range
and value
fields - use range
only for cardinality ranges and value
otherwise.
Alternatives
You may specify parenthesized expressions of arbitrary complexity in alternatives
to fine-tune your translations. It’s perfectly legal to have an alternative like this:
gender == MASCULINE && (bookCount > 10 || magazineCount > 20)
Lokalized will automatically evaluate cardinality and ordinality for numbers if required by the expression. For example, in English, if I were to supply bookCount
of 50
, this expression would evalute to true
:
bookCount == CARDINALITY_OTHER
…and so would this:
bookCount == 50
Note that the supported comparison operators for cardinality, ordinality, and gender forms are ==
and !=
. You cannot say bookCount < CARDINALITY_FEW
, for example.
Alternative expression recursion is supported. That is, each value for alternatives
can itself have translation
, placeholders
, commentary
, and alternatives
. You can also use the simpler string-only form if no special translation functionality is needed.
Alternative evaluation follows these rules:
- Deepest level of recursion is evaluated first
- Expressions are evaluated according to their order in the list, halting at first matched expression
A somewhat contrived example of multiple levels of recursion follows. The first level of recursion uses a full object, the second uses the string shorthand.
Evaluation works as you might expect.
// Deepest recursionString translation = strings.;;// 1 level deep recursiontranslation = strings.;;// Normal casetranslation = strings.;;
A grammar for alternative expressions follows.
EXPRESSION = OPERAND COMPARISON_OPERATOR OPERAND | "(" EXPRESSION ")" | EXPRESSION BOOLEAN_OPERATOR EXPRESSION ;OPERAND = VARIABLE | LANGUAGE_FORM | NUMBER ;LANGUAGE_FORM = CARDINALITY | ORDINALITY | GENDER ;CARDINALITY = "CARDINALITY_ZERO" | "CARDINALITY_ONE" | "CARDINALITY_TWO" | "CARDINALITY_FEW" | "CARDINALITY_MANY" | "CARDINALITY_OTHER" ;ORDINALITY = "ORDINALITY_ZERO" | "ORDINALITY_ONE" | "ORDINALITY_TWO" | "ORDINALITY_FEW" | "ORDINALITY_MANY" | "ORDINALITY_OTHER" ;GENDER = "MASCULINE" | "FEMININE" | "NEUTER" ;VARIABLE = { alphabetic character | digit } ;BOOLEAN_OPERATOR = "&&" | "||" ;COMPARISON_OPERATOR = "<" | ">" | "<=" | ">=" | "==" | "!=" ;
What Expressions Currently Support
- Evaluation of “normal” infix expressions of arbitrary complexity (can be nested/parenthesized)
- Comparison of gender, plural, and literal numeric values against each other or user-supplied variables
What Expressions Do Not Currently Support
- The unary
!
operator - Explicit
null
operands (can be implicit, i.e. aVARIABLE
value) - A cardinality range construct (to be added in a future release)
Keying Strategy
Ultimately, it is up to you and your team how best to name your localization keys. Lokalized does not impose key naming constraints.
There are two common approaches - natural language and contextual. Some benefits and drawbacks of each are listed below to help you make the best decision for your situation.
Natural Language Keys
For example: "I read {{bookCount}} books."
Pros
- Any developer can create a key by writing a phrase in her native language - no need to coordinate with others or choose arbitrary names
- Placeholders are encoded directly in the key and serve as “automatic” documentation for translators
- There is always a sensible default fallback in the event that a translation is missing
Cons
- Context is lost; the same text on one screen might have a completely different meaning on another
- Not suited for large amounts of text, like a software licensing agreement
- Small changes to text require updating every strings file since keys are not “constant”
Contextual Keys
For example: "SCREEN-PROFILE-BOOKS_READ"
Pros
- It is possible to specifically target app components, which enforces translation context
- Perfect for big chunks of text like legal disclaimers
- “Constant” keys means translations can change without affecting code
Cons
- You must come up with names for every key and cross-reference in your localized strings files
- Placeholders are not encoded in the key and must be communicated to translators through some other mechanism
- Requires diligent recordkeeping and inter-team communication (“are our iOS and Android apps using the same keys or are we duplicating effort?”)
- There is no default language fallback if no translation is present; users will see your contextual key onscreen
Or - Mix Both!
It’s possible to cherrypick and create a hybrid solution. For example, you might use natural language keys in most cases but switch to contextual for legalese and other special cases.
java.util.logging
Lokalized uses java.util.logging
internally. The usual way to hook into this is with SLF4J, which can funnel all the different logging mechanisms in your app through a single one, normally Logback. Your Maven configuration might look like this:
ch.qos.logbacklogback-classic1.1.9org.slf4jjul-to-slf4j1.7.22
You might have code like this which runs at startup:
// Bridge all java.util.logging to SLF4Jjava.util.logging.Logger rootLogger = java.util.logging.LogManager..;for (Handler handler : rootLogger.)rootLogger.;SLF4JBridgeHandler.;
Don’t forget to uninstall the bridge at shutdown time:
// Sometime laterSLF4JBridgeHandler.;
Note: SLF4JBridgeHandler
can impact performance. You can mitigate that with Logback’s LevelChangePropagator
configuration option as described here.
About
Lokalized was created by Mark Allen and sponsored by Transmogrify LLC and Revetware LLC.