Trail: Internationalization
Lesson: Working with Text
Section: Comparing Strings
Customizing Collation Rules
Home Page > Internationalization > Working with Text

Customizing Collation Rules

The previous section discussed how to use the predefined rules for a locale to compare strings. These collation rules determine the sort order of strings. If the predefined collation rules do not meet your needs, you can design your own rules and assign them to a RuleBasedCollator object.

Customized collation rules are contained in a String object that is passed to the RuleBasedCollator constructor. Here's a simple example:

String simpleRule = "< a < b < c < d";
RuleBasedCollator simpleCollator =  new RuleBasedCollator(simpleRule);

For the simpleCollator object in the previous example, a is less than b, which is less that c, and so forth. The simpleCollator.compare method references these rules when comparing strings. The full syntax used to construct a collation rule is more flexible and complex than this simple example. For a full description of the syntax, refer to the API documentation for the RuleBasedCollator class.

The example that follows sorts a list of Spanish words with two collators. Full source code for this example is in RulesDemo.java.

The RulesDemo program starts by defining collation rules for English and Spanish. The program will sort the Spanish words in the traditional manner. When sorting by the traditional rules, the letters ch and ll and their uppercase equivalents each have their own positions in the sort order. These character pairs compare as if they were one character. For example, ch sorts as a single letter, following cz in the sort order. Note how the rules for the two collators differ:

String englishRules = (
    "< a,A < b,B < c,C < d,D < e,E < f,F " +
    "< g,G < h,H < i,I < j,J < k,K < l,L " +
    "< m,M < n,N < o,O < p,P < q,Q < r,R " +
    "< s,S < t,T < u,U < v,V < w,W < x,X " +
    "< y,Y < z,Z");

String smallnTilde = new String("\u00F1"); // 	ñ
String capitalNTilde = new String("\u00D1"); // 	Ñ

String traditionalSpanishRules = (
    "< a,A < b,B < c,C " +
    "< ch, cH, Ch, CH " +
    "< d,D < e,E < f,F " +
    "< g,G < h,H < i,I < j,J < k,K < l,L " +
    "< ll, lL, Ll, LL " +
    "< m,M < n,N " +
    "< " + smallnTilde + "," + capitalNTilde + " " +
    "< o,O < p,P < q,Q < r,R " +
    "< s,S < t,T < u,U < v,V < w,W < x,X " +
    "< y,Y < z,Z");

The following lines of code create the collators and invoke the sort routine:

try {
    RuleBasedCollator enCollator =
        new RuleBasedCollator(englishRules);
    RuleBasedCollator spCollator =
        new RuleBasedCollator(
            traditionalSpanishRules);

    sortStrings(enCollator, words);
    printStrings(words);

    System.out.println();

    sortStrings(spCollator, words);
    printStrings(words);
} catch (ParseException pe) {
    System.out.println(
        "Parse exception for rules");
}

The sort routine, called sortStrings, is generic. It will sort any array of words according to the rules of any Collator object:

public static void sortStrings(
    Collator collator, String[] words) {
    String tmp;
    for (int i = 0; i < words.length; i++) {
        for (int j = i + 1; j < words.length; j++) {
            if (collator.compare(words[i], words[j]) > 0) {
                tmp = words[i];
                words[i] = words[j];
                words[j] = tmp;
            }
        }
    }
}

When sorted with the English collation rules, the array of words is as follows:

chalina
curioso
llama
luz

Compare the preceding list with the following, which is sorted according to the traditional Spanish rules of collation:

curioso
chalina
luz
llama

Problems with the examples? Try Compiling and Running the Examples: FAQs.
Complaints? Compliments? Suggestions? Give us your feedback.

Previous page: Performing Locale-Independent Comparisons
Next page: Improving Collation Performance