Post

Software Internationalization

The main challenges of software Internationalization

There are many problems when designing software and Internationalization is one of the worst ~ Computerphile.

Internationalization is the design and development of a product that enables easy localization for target audiences that vary in culture, region or language. Internationalization describes both, Globalization and Localization

i) Globalization

Globalization is the design and planning for a product to support global markets. This refers mostly to the strategy of expanding your business outside it’s national borders.

Example of globalization
  • Netflix operates in more than 190 countries and customizes content offerings for individual markets with subtitles and programming in local languages
  • Online marketplaces like Amazon make it easy to buy products from businesses or individuals on the other side of the planet. Consumer electronics, for example, are commonly sourced from raw materials in India, made in China, then sold in America.
ii) Localization

Localization refers to the actual adaptation of the product for a specific market. Translation is only one of several elements of the localization process. The localization process may also include:

  • Naming conventions (e.g., people from certain cultures may not have last names or may have multiple last names)
  • Currency (symbol and amount)
  • System of measurement, metric or imperial Using proper local formats for dates, addresses, and phone numbers

The aim of localization is to give a product the look and feel of having been created specifically for a target market, no matter their language, culture, or location.

Let’s assume we are developing a global online shop like amazon which will enable people to buy and sell goods over the internet. Our application will initially support localization for English(en-GB), French(fr) and Arabic(ar).Below are some problems that we will have to solve in order to create an outstanding and properly localized online marketplace:

a) The translation problem.

On the client side there is need to translate the user interface according to the user’s locale.On the server-side there is a need to localize API responses such as business logic errors. For example, say a client makes a request to purchase a product while there is insufficient funds in his account to make the purchase, a business logic error will be evoked as response, which needs to be localized according to the client’s locale/culture.

There is a counter intuitive pattern in text localization: The people who are best suited for localization tend to be the least interested to consume the efforts. This comes from mainly the spreading of English combined with lack of decent translators. If the user has a decent understanding in English, he or she may likely use the English Interface over the localized version. If a user doesn’t know English he may have to memorize the English interface, work with a poorly translated interface or not use the software at all.

Solution to the problem

Translate all the text and do it properly. Some organizations have the capacity to hire professional translators who have an exceptional understanding of a locale language. If the software sells in the locale, a company will easily recover the money spent on the services.

On the other hand, there are some decent translation services cropping up such as MyMemoryTransalte.

b) Dates and Date Formats

Different cultures have different time formats. Although each date basically displays the day, month, and year, their presentation order and separators vary greatly.The separators may be slashes, dashes or periods. Some locales print leading zeroes, others suppress them. To help illustrate this, let’s compare time formats for the 7th of August 2020 between English, French and Arabic.

i) English(GB)

dd/mm/yy - 07/08/20

ii) French

dd/mm/yyyy - 07/08/2020

iii) Arabic

dd/mm/yyyy - 07/08/2020

Client from different locales may be confused by date formats. The above dates could be interpreted in different ways. For example, 07/08/20 could mean July 8, 2020 or August 7, 2020. On an individual level this uncertainty can be very frustrating, in a business context it can be very expensive.

Solution to the problem
i) Use a locale neutral format

ISO(International Organisation for Standardization) 8601 specifies a format of YYYY-MM-DD.Date and time values are ordered from the largest to smallest unit of time: year, month, day. 2003-04-02 is clearer than 03/04/02.

Pros
  • Universal agreement - This date structure is universally agreed. It solves ambiguity in 07/08/2020.
  • Easily to compare and sort. Doing a standard character sort on a list of dates gives you a chronologically ordered list. Cannot be confused with other common date representations Strings that contain a date followed by a time are also easy to compare and sort (e.g. “2019-09-07 20:15:00”)
Cons
  • People are comfortable with their “natural” date formats.
ii) Make the month and year obvious

To do this use a name for the month (abbreviated or not) and use 4 digits for all Gregorian year numbers. For example, 2 April 2003.

Pros
  • This method is completely unambiguous.
Cons
  • It is less computer friendly when it comes to sorting, etc
  • It takes more space. In some locales even the abbreviation for a month name may be longer than three characters. (In French the first three letters of June and July are the same, juin and juillet). Allowing extra space for this exacerbates the space problem.
iii) Use the Accept-Language HTTP header

The HTTP Accept-Language header only specifies the user’s language preferences, but is commonly used to determine locale preferences as well. This method works well for dynamically generated web documents when inserting a date from some back-end storage into a page, as long as the user’s expectations of date format are clear. Appropriateness is a function of the linguistic context rather than simply the user’s browser settings.

c) Numbers and number formats

When dealing with numeric values, there are five major items to pay attention to:

i) The character used as the thousands separator.

In the en-GB, this character is a comma (,). In fr, the thousands separator is a space and (٬) in arabic. Thus one thousand and twenty-five is displayed as 1,025 in the en-GB, 1 025 in french and 1٬025 in arabic.

ii) The character used as the decimal separator.

In the en-GB, this character is a period (.). In french, it is a comma (,) and (.) in arabic. Thus one thousand twenty-five and seven tenths is displayed as 1,025.7 in the en-GB and 1 025,7 in french and 1,025.7 in arabic.

iii) The way negative numbers are displayed.

The negative sign can be used at the beginning of the number, but it can also be used at the end of the number. Alternatively, the number can be displayed with parentheses around it or even in a color such as red. Thus a negative five hundred and twenty-seven could be displayed as:

-527 or 527- or (527) or [527]

iv) Digit grouping.

This refers to the number of digits contained between each separator for all digit groups that appear to the left of the decimal separator. For example, the 3-digit group is used for most cultures, such as for English (United States): 123,456,789.00. However, notice that Hindi uses a 2-digit grouping, except for the 3-digit grouping for denoting hundreds: 12,34,56,789.00

v) The placement of the percent sign (%).

It can be written several ways: 98%, 98 %, 98 pct, %98. Thus you should never assume that you can hard-code the percent sign.

Final thoughts

There is a significant profitability for a company entering a new market and this is achieved by properly localizing a product. If you are aware of the mentioned localization issues, you will avoid the headache of reworking the whole application to support other cultures. Hopefully, this information will make your software localization process smooth and give you the best chances of reaping the software localization rewards.

Thank You!

I’d love to keep in touch! Feel free to follow me on Twitter at @codewithfed. If you enjoyed this article please consider sponsoring me for more. Your feedback is welcome anytime. Thanks again!

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.