An XML-based utility for handling localized strings Properties files are used to store and maintain the localized strings at the heart of most internationalized Java applications. It’s too bad maintaining several properties files can be such a headache. In this article Melih Çetin introduces the Properties Pre-Processor, an XML-based utility that enables UTF-8 encoding and lets you maintain localized strings for all of your supported locales in a single XML file, as well as easily sharing them across multiple applications.Like any other commodity these days, a good percentage of Java applications are marketed and sold internationally. Java Web applications especially must be written with the global marketplace in mind, regardless of whether they originate from the United States, Germany or China. You cannot easily internationalize your software after it has been built (although some tools facilitate this process), or decide midway through the development process to add localization support to your Java applications. The earlier you start thinking about the steps involved in internationalizing your Java applications, the better.In this article I will briefly introduce the common concerns associated with internationalizing Java applications, as well as the classes that support internationalization on the Java platform. I’ll then introduce a simple tool that supports the internationalization of Java applications by making it easier to create and maintain localized strings for UI elements and add new ones as your application is extended for new languages. Note that this article is not intended as a comprehensive introduction to internationalizing Java applications. See the Resources section for a listing of articles and tutorials about internationalization.Qualities of an internationalized applicationThe Java platform provides considerable support for internationalization. When developing desktop applications you do not need to do anything special to identify the user’s language preference: the JRE sets your application’s locale based on the user’s settings in the underlying operating system. Of course, it is possible to make the language settings for your desktop application configurable, and you certainly will need to address language preferences when developing internationalized applications for the Web.Regardless of whether it will be distributed in desktop environments or on the Web, an internationalized Java application should do the following: Display captions and messages in the user’s language.Use the correct collating sequence when sorting strings.Display the date in the format expected by the user.Display numbers such as integers, real numbers, percentages and currency in the format expected by the user.Adjust the directional flow of text as expected by the user.To do these things your application must be language aware. java.util.Locale is the utility class used to define language, or more specifically locale, in Java applications.java.util.LocaleDevelopment without bordersInternationalization is an important aspect of open source development projects. Because open source projects often cross borders they must be accessible to developers speaking a variety of native languages. The resulting software must also be architected in such a way that any contributor can easily localize it for his or her native language. java.util.Locale contains three parameters: language, country and variant. Most of the time, the language parameter on its own is sufficient to identify the applicable locale. Some languages are spoken in more than one locale, however, and there might be enough variations that you need to recognize these differences in your application. For example, Portuguese is spoken in both Portugal and Brazil, so you must use the country parameter in addition to language parameter to differentiate between these two locales.Similarly, in countries where more than one language is spoken (such as Belgium, Switzerland and Canada), country on its own is not sufficient to identify the user’s locale. You must consider both the country and the language parameter. You might also have to set the variant parameter for languages that split into various local dialects. When you develop software, you cannot consider all the language preferences and variations that may be applied to your application. What you can do is programmatically account for the existence of language preferences and variation. Your application should be designed to extensibly support a growing list of language preferences.java.util.ResourceBundleA localized string is a string of text that can be used to display information in a specific language, generally used for creating buttons, messages, menus, captions and other elements of a user interface. java.util.ResourceBundle is the Java class used to get localized strings.As its name implies, ResourceBundle groups together the resources that you can use to get localized strings for different locales. You simply assign a mnemonic name (also known as a key) to your String and define the corresponding values for different locales in separate properties files. ResourceBundle then gets the appropriate translation for you at runtime. It is possible to control ResourceBundle by providing locale information at runtime, but it is simpler to let ResourceBundle do the work for you. In this case you need only assume that the default locale set for the JVM is correct for the user. By default, the JVM will set the locale based on the user’s operating system settings. You can also allow users to change their language settings dynamically at runtime and reflect these changes via the Locale.setDefault method. Note that any change made using this call affects the default locale for the entire JVM, which may not be desirable in server environments that are serving several users with different locale preferences.Using properties files for resource managementResourceBundle is an abstract class. Typical usage is to have the ResourceBundle.getBundle method construct a PropertyResourceBundle object based on the locale settings and using the appropriate properties file or files.The properties file hierarchy is similar to class inheritance. There is a base properties file and a separate one for every locale the application supports. Figure 1 shows the properties files for an application that supports Norwegian and various dialects of English and Spanish. The contents of these files might be as follows:myresource.properties productName=Ultra Fast Editor ... myresource_en.properties about=About file=File help=Help ... myresource_es.propertiesabout=Sobre file=Archivo help=Ayuda ... As you can see, a string that was the same in all languages (such as the name of your product) would only need to be held in your base file. Language-specific files would hold localized strings for your UI components. And country-specific properties files would hold the varying spellings for these strings — such as color in myresource_en_US.properties and colour in myresource_en_GB.properties.java.text.MessageFormatYou can use placeholders in your messages and replace them with location-appropriate values at runtime based on the semantics of your user’s language. For this you use the MessageFormat class. To best understand how MessageFormat works, consider an example. Assume that you have an application where you would like to display the number of tables and stored-procedure packages in a database schema. You would have such a property defined in your properties file as shown in Listing 1.Listing 1. Message text with placeholders noTablesAndPackages = There are {0,number,integer} tables and {1,number,integer} packages in Schema {2} As you have probably noted, MessageFormat allows you to provide extra information about the characteristics of your placeholders so that you can control their formatting for a variety of locations. For instance, you could change the order of the placeholders for appropriate representation in a different language format; you could also do this if you needed to update the information presentation at a later time. A revised properties file is shown in Listing 2. Listing 2. Properties file revised noTablesAndPackages = Schema {2} contains {1,number,integer} packages and {0,number,integer} tables The change to your properties file does not require any changes to your Java code. The Java code in Listing 3 can refer to both properties files shown in Listing 1 and Listing 2.Listing 3. Java code for a location-aware application ResourceBundle resBundle = ResourceBundle.getBundle("com/mycomp/resources/i18n"); String rawMessage = resBundle.getString("noTablesAndPackages"); String messageTxt = MessageFormat.format(rawMessage, 95, 9, "myschema"); If you use the original properties file shown in Listing 1, the value of messageTxt will be: There are 95 tables and 9 packages in Schema myschema.If you use the properties file shown in Listing 2, the value of messageTxt will be: Schema myschema contains 9 packages and 95 tables.Note that if you had only one package in myschema your message would not read correctly. See the Resources section to learn about using the Java class java.text.ChoiceFormat to handle such cases. Limitations of properties filesBased on these examples it is clear that the Java platform provides substantial support for internationalization, at least as far as the runtime environment is concerned. The properties file adds considerable flexibility and ease to the process of internationalizing Java applications, but it is lacking in a few important areas. Major problems commonly encountered by developers using properties files to manage localized strings are as follows:You have to maintain multiple properties files and keep them in synch. There is no single view that lets you see all the translations for a single mnemonic. This especially becomes an issue when you want to add a new language to your application preferences: which properties file should you use as your starting point?Properties files use ISO 8859-1 (Latin-1) character encoding. Any character that does not exist in ISO 8859-1 has to be encoded using Unicode escapes. Taking the example of a Turkish properties file, you might have an entry like this one: help=Yardu0131m The Turkish character ı is not part of the ISO 8859-1 character encoding so it must be encoded using Unicode escapes. As a result it can be difficult to read and maintain properties files especially for far eastern languages. The same entry would appear as follows in Japanese properties file, by the way: help=u52A9u3051 Although there is a sort of inheritance for properties files of the same bundle for different locales, you cannot share mnemonics across the properties files of different bundles. This becomes problematic if you have multiple internationalized products. For example, almost every online application has menus and menu items such as About, File, Help and Open. It would be ideal to define these entries only once and share them across your products. When you introduce a new language to your product set, it would be most efficient to add it in just one place.The Properties Pre-ProcessorWhy XML?I have used XML as the basis of the Properties Pre-Processor utility to leverage its universal syntax and widely available parsers. The origin of the Properties Pre-Processor goes back to VAX/VMS. A pre-XML version of the utility (not for Java) was based on Lex and Yacc and defined a simple syntax, but that labor-intensive approach wouldn’t make sense in today’s context. One advantage of using XML is the many editors that you can use to edit XML-based properties files. You only need to be sure that the editor you choose recognizes UTF-8 encoding. I use the popular open source text editor jEdit. I have created an XML-based utility that addresses all of these issues having to do with storing and manipulating localized strings in properties files. Simply put, the Properties Pre-Processor uses XML to maintain any number of localized strings in a single file. Thus, the Properties Pre-Processor lets you store and view all the translations of a mnemonic in one place.The Properties Pre-Processor removes the issue of handling ISO 8859-1 characters by ensuring that you use UTF-8 encoding for your XML files. It also resolves the issue of sharing mnemonics across several properties files with the help of the XML #include. (From time to time in my Java programming practice I miss this C language feature; fortunately it is accessible using XML. Not all XML parsers support it but addressing the lack of support isn’t a big deal.) The Properties Pre-Processor at workThe easiest way to learn about the Properties Pre-Processor utility is to see it at work. Assume that Listing 4 is the core XML file for your application, where you keep translations for strings such as About, File, Help, and Open.Listing 4. An XML-based properties file for storing localized strings<?xml version="1.0" encoding="utf-8"?> <propertyList> <property mnemonic="about"> <text>About</text> <localizedText lang="es">Sobre</localizedText> <localizedText lang="tr">Hakkında</localizedText> <localizedText lang="ja">について</localizedText> <localizedText lang="temp"></localizedText> </property> <property mnemonic="file"> <text>File</text> <localizedText lang="es">Archivo</localizedText> <localizedText lang="tr">Dosya</localizedText> <localizedText lang="ja">ファイル</localizedText> <localizedText lang="temp"></localizedText> </property> <property mnemonic="help"> <text>Help</text> <localizedText lang="es">Ayuda</localizedText> <localizedText lang="tr">Yardım</localizedText> <localizedText lang="ja">助け</localizedText> <localizedText lang="temp"></localizedText> </property> <property mnemonic="open"> <text>Open</text> <localizedText lang="es">Abierto</localizedText> <localizedText lang="tr">Aç</localizedText> <localizedText lang="ja">開いた</localizedText> <localizedText lang="temp"></localizedText> </property> </propertyList> This file is named i18n.xml and is located in your src/com/mycomp/shared/resources directory. Note that temp is a placeholder for any new locale you want to add to the XML file. All you need to do to add a new language is replace “temp” with the corresponding language code (such as fr for French) and add the localized string for that language. You can write a simple utility or use the macro recording/playback feature of your text editor. You can easily include this XML file in a product-specific XML file, as shown in Listing 5.Listing 5. The XML file as an include<?xml version="1.0" encoding="utf-8"?> <propertyList> <b><xi:include href="com/mycomp/shared/resources/i18n.xml" /></b> <property mnemonic="action"> <text>Action</text> <localizedText lang="es">Acción</localizedText> <localizedText lang="tr">İşlem</localizedText> <localizedText lang="ja">行為</localizedText> <localizedText lang="temp"></localizedText> </property> <property mnemonic="connection"> <text>Connection</text> <localizedText lang="es">Conexión</localizedText> <localizedText lang="tr">Bağlantı</localizedText> <localizedText lang="ja">関係</localizedText> <localizedText lang="temp"></localizedText> </property> </propertyList> The final issue is incorporating the generation of properties files out of the XML files into your build process. If your build process is Ant-based you can use the following sample Ant target to add it to your build. In fact, this sample is from a NetBeans project. Like many other IDEs, NetBeans uses Ant as the underlying build engine and it gives you the flexibility to enhance the overall build process. Listing 6. Incorporating the XML file into an Ant build<target name="-pre-jar"> <java fork="true" classname="codegen.PropertiesGenerator"> <arg value="-ip"/> <arg value=".;${build.classes.dir}"/> <!-- <b>1</b> --> <arg value="${build.classes.dir}/com/mycomp/myprod/resources/i18n"/> <!-- <b>2</b> --> <classpath> <pathelement path="${lib.dir}/PropertiesGenerator.jar"/> <!-- <b>3</b> --> </classpath> </java> </target> Some notes about the Ant task definitionThe numbered descriptions below correspond to the numbers in Listing 6.A semi-colon (;) separated list of directories that will be searched for included XML files.The path to the main XML file (please note that no XML extension is mentioned).PropertiesGenerator.jar has to be located in the classpath provided to the Ant task.Executing this pre-jar Ant target will create four properties files having all the definitions combined from the two XML files. Conversion from UTF-8 to Unicode encoding is part of properties-file generation. In conclusionSee the Resources section to download the Properties Pre-Processor utility, as well as the prebuilt jar file and the sample code for this article. Note that the utility was built using JDK 1.6. If you encounter any version problems simply rebuild the utility using the provided scripts.Melih Çetin is a software architect for Diligence Software, specializing in software for medical imaging. Melih has been developing software for the international market using the Java platform since the release of JDK 1.0.2. JavaSoftware DevelopmentTechnology IndustryCareersProgramming Languages