What’s so great about ICU4J?

Yesterday, I introduced International Components for Unicode for Java (ICU4J) by describing how you can get rid of it. I feel pretty about about that, because ICU4J provides goodness that’s too important to simply dismiss. If you’re building applications for an international audience, you have to get this stuff right.

I spent a few minutes today browsing around for a definitive list of reasons why ICU4J is better than what is provided by the standard Java libraries. So far, I haven’t been able to find that list. What I did find is a few examples of where ICU4J shines.

Consider the following example:

NumberFormat format = NumberFormat.getCurrencyInstance();
format.setCurrency(Currency.getInstance("JPY"));
System.out.println(format.format(4.0));

With ICU4J, the output of this is “¥4”; using the standard libraries that ship with Java 6, the output is “JPY4.00”. Very different results. I’m no expert on international currencies, but a cursory review of the web shows me that the former is more common/correct (the use of the proper symbol is a dead giveaway). You can coerce the number of decimal points using the setMinimumFractionDigits() methods.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to What’s so great about ICU4J?

  1. Zeb Olaf says:

    I looked for a description of what makes ICU4J better as well… with no luck. What I did find makes me suspect that perhaps it’s more frequently updated. I know that Java had a misspelling in one of it’s timezones (Load Howe instead of Lord Howe) for a long while. I suspect ICU4J either never had that problem, or fixed it more quickly than Sun did.

  2. Neil Bartlett says:

    One big problem with the Java libraries is that resource bundles must be saved in ISO 8859-1 encoded text files. That’s the basic European character set, so although you get various accented Latin character, you’re out of luck if you need (say) Japanese or Russian or Arabic. Also you don’t get the Euro currency symbol and even some characters used in French!The only way to get characters from these languages into a standard Java resource bundle is to use \u escape characters. Therefore a typical localization file for Japanese would look like this:errormessage=\u6291\u2820\u2037\u2201\u7970\u5758\u8908\u8743Of course no translator can work with a file that looks like that, so you usually have to perform a conversion at build time from something that contains real Japanese text.ICU4J on the other hand allows for various Unicode encodings such as UTF-8, and therefore the files that are produced by your translators can be used directly by ICU4J.Also ICU4J allows for some internal structure (ie sections) within its localization files, whereas Java resource bundles are flat with no internal structure except via naming conventions.NB that unicode string above is just random digits from my keypad. If you run them through a translator and find that I have insulted your mother, please accept my apologies.

Leave a comment