Trail: Internationalization
Lesson: Internationalization of Network Resources
Internationalized Resource Identifier
Home Page > Internationalization > Internationalization of Network Resources

Internationalized Resource Identifier

Internationalized Resource Identifier (IRI) like IDN may contain Unicode characters, while Uniform Resource Identifier (URI) is limited to ASCII symbols only.

According to RFC 3987 IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire.

At first sight, you may consider that this task must been decided with the same means as for IDN. But there is not so exactly. Let's view a resource identifier structure:

This figure represents a resource identifier structure

You may notice that it has several components.

The authority component of a URI parses according to the following syntax

[user-info@]host[:port] 

where the characters @ and : stand for themselves. The host component can be an IP-literal, an IPv4address, or just a name.

In a case, where a host is a domain name the IDN approach, i.e. the mapping, could be applied.

But generally the URI structure is more complicated. Applications can use URI-reference syntax to make reference to a URI, instead of always using above generic syntax rule. A URI-reference is either a URI or a relative reference. If a URI-reference doesn't specifies a scheme, it is said to be a relative reference. Usually, a relative reference expresses a URI reference relative to the name space of another URI.

Nevertheless, the instances the java.net.URI class can represent IRIs whenever they contain non ASCII characters.

This class was enhanced by the following methods to perform the operations and conversions according to RFC 3987:


Problems with the examples? Try Compiling and Running the Examples: FAQs.
Complaints? Compliments? Suggestions? Give us your feedback.

Previous page: Internationalized Domain Name
Next page: End of Trail