Java Technology Home Page
A-Z Index

Java Developer Connection(SM)
Online Training

Downloads, APIs, Documentation
Java Developer Connection
Tutorials, Tech Articles, Training
Online Support
Community Discussion
News & Events from Everywhere
Products from Everywhere
How Java Technology is Used Worldwide
 
Training Index

Writing Advanced Applications
Chapter 8 Continued: Performance Analysis

[<<BACK] [CONTENTS] [NEXT>>]

Another way to improve performance is with performance analysis. Performance analysis is looking at program execution to pinpoint where bottlenecks or other performance problems such as memory leaks might occur. Once you know where potential touble spots are, you can change your code to remove or reduce their impact.


Profiling

The JavaTM Virtual Machines (VMs) have had the ability to provide simple profile reports since Java Development Kit (JDKTM) 1.0.2. However, the information they provided was limited to a sorted list of method calls a program had called.

The Java® 2 platform software provides much better profiling capabilities than previously available and analysis of this generated data has been made easier by the emergence of a Heap Analysis Tool (HAT). The heap analysis tool, as its name implies, lets you analyze profile reports of the heap. The heap is a block of memory the Java VM uses when it is running. The heap analysis tool lets you generate reports on objects that were used to run your application. Not only can you get a listing of the most frequently called methods and the memory used in calling those methods, but you can also track down memory leaks. Memory leaks can have a significant impact on performance.

Analyze a Program

To analyze the TableExample3 program included in the demo/jfc/Table directory in the Java 2 platform download, you need to generate a profile report. The simplest report to generate is a text profile. To generate a text profile, run the application with the -Xhprof parameter. In the final release of the Java 2 platform software, this option was renamed -Xrunhprof. To see a list of the currently available options run the command
java -Xrunhprof:help
Hprof usage: -Xrunhprof[:help]|[<option>=<value>, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b ascii or binary output a
file=<file> write data to file java.hprof(.txt for ascii)
net=<host>:<port> send data over a socket write to file
depth=<size> stack trace depth 4
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces y
thread=y|n thread in traces? n
doe=y|n dump on exit? y
Example: java -Xrunhprof:cpu=samples,file=log.txt,
                               depth=3 FooClass

The following invocation creates a text output file that you can view without the heap analysis tool called java.hprof.txt when the program generates a stack trace or exits. A different invocation is used to create a binary file to use with the heap analysis tool.

  java -Xrunhprof TableExample3

  d:\jdk12\demo\jfc\Table> java -Xrunhprof TableExample3
  Dumping Java heap ... allocation sites ... done.
The profile option literally logs every object created on the heap, so even just starting and stopping the small TableExample3 program results in a four megabyte report file. Although the heap analysis tool uses a binary version of this file and provides a summary, there are some quick and easy things you can learn from the text file without using the heap analysis tool.

Note: To list all available options, use
java -Xrunhprof:help

View the Text File

Choose an editor that can handle large files and go to the end of this file. There could be hundreds of thousands of lines, so use a shortcut instead of scrolling, or search for the words SITES BEGIN. You should see a list of lines that start with an increasing rank number followed by two percentage numbers. The first entry in this list, should look similar to the example below:
SITES BEGIN (ordered by live bytes) 
                  Sun Dec 20 16:33:28 1998
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 55.86% 55.86% 826516 5 826516 5 3981 [S

The [S notation at the end of the last line above indicates the first entry is an array of the short, a primitive type. This is expected with Swing or Abstract Window Toolkit (AWT) applications. The five count under the objs header mean there are currently five of these arrays, there have only been five in the lifetime of this application, and they take up 826516 bytes. The reference key to this object is the value listed under stack trace. To find where this object was created in this example, search for TRACE 3981. You will see the following:

TRACE 3981:
java/awt/image/DataBufferUShort.<init>(
                           DataBufferUShort.java:50)
java/awt/image/Raster.createPackedRaster(
                                    Raster.java:400)
java/awt/image/DirectColorModel.
      createCompatibleWritableRaster(
		             DirectColorModel.java:641)
sun/awt/windows/WComponentPeer.createImage(
                             WComponentPeer.java:186)

The TableExample3 code sets a scrollpane that is 700 by 300. When you look at the source of Raster.java, which is in the src.jar file, you find these statements at line 400:

  case DataBuffer.TYPE_USHORT:
         d = new DataBufferUShort(w*h);
         break;

The values w and h are the width and height from the createImage call at the start of TRACE 3981. The DataBufferUShort constructor creates and array of shorts as follows:

  data = new short[size];
where size is w*h. So, in theory there should be an entry for an array of 210000 elements. You look for each instantiation of this class by searching for trace=3981. One of the five entries will look like this:
  OBJ 5ca1fc0 (sz=28, trace=3979, 
	class=java/awt/image/DataBufferUShort@9a2570)
   data 5ca1670
   bankdata 5ca1f90
   offsets 5ca1340
   ARR 5ca1340 (sz=4, trace=3980, nelems=1, 
                                elem type=int)
   ARR 5ca1670 (sz=420004, trace=3981, nelems=210000, 
                                elem type=short)
   ARR 5ca1f90 (sz=12, trace=3982, nelems=1, 
                                elem type=[S@9a2d90)
   [0] 5ca1670

You can see that the data value of this raster image references an array 5ca1670 which in turns lists 210000 elements of a short of size 2. This means 420004 bytes of memory are used in this array.

From this data you can conclude that the TableExample3 program uses nearly 0.5Mb to map each table. If the example application is running on a small memory machine, you should make sure you do not keep unnecessary references to large tables or images that are built by the createImage method.

The Heap Analysis Tool

The Heap Analysis tool can analyze the same data for you, but requires a binary report file as input. You can generate a binary report file as follows:
  java -Xrunhprof:file=TableExample3.hprof,format=b 
                                       TableExample3

To generate the binary report, close the TableExample3 window. The binary report file TableExample3.hprof is created when the program exits. The Heap Analysis tool starts an HTTP Server that analyzes the binary profile file and displays the results in HTML that you can view with a browser.

You can get a copy of the Heap Analysis Tool from the java.sun.com site. Once you install it, you run shell and batch scripts in the installed bin directory so you can start the Heap Analysis Tool server as follows:

  >hat TableExample3.hprof
  Started HCODEP server on port 7000
  Reading from /tmp/TableExample3.hprof...
  Dump file created Tue Jan 05 13:28:59 PST 1999
  Snapshot read, resolving...
  Resolving 17854 objects...
  Chasing references, 
            expect 35 dots.......................
  Eliminating duplicate 
                references.........................
  Snapshot resolved.
  Server is ready.

The above output tells you an HTTP server is started on port 7000 by default. To view this report enter the url http://localhost:7000 or http://your_machine_name:7000 in your web browser. If you have problems starting the server using the scripts, you can alternatively run the application by including the hat.zip classes file on your CLASSPATH and use the following command:

  java hat.Main TableExample3.hprof

The default report view contains a list of all the classes. At the bottom of this initial page are the following two key report options:

  Show all members of the rootset
  Show instance counts for all classes

If you select the Show all members of the rootset link, you see a list of the following references because these reference are likely targets for potential memory leaks.

 Java Static References
 Busy Monitor References
 JNI Global References
 JNI Local References
 System Class References

What youn look for here are instances in the application that have references to objects that have a risk of not being garbage collected. This can sometimes occur in the case of JNI if memory is allocated for an object, the memory is left to the garbage collector to free up, and the garbage collector does not have the information it needs to do it. In this list of references, you are mainly interested in a large number of references to objects or objects of a large size.

The other key report is the Show instance counts for all classes. This lists the number of calls to a particular method. The String and Character array objects, [S and [C, are always going to be high on this list, but some objects are a bit more intriguing. Why are there 323 instances of java.util.SimpleTimeZone for example?

  5109 instances of class java.lang.String
  5095 instances of class [C
  2210 instances of class java.util.Hashtable$Entry
  968 instances of class java.lang.Class
  407 instances of class [Ljava.lang.String;
  323 instances of class java.util.SimpleTimeZone
  305 instances of class 
        sun.java2d.loops.GraphicsPrimitiveProxy
  304 instances of class java.util.HashMap$Entry
  269 instances of class [I
  182 instances of class [Ljava.util.Hashtable$Entry;
  170 instances of class java.util.Hashtable
  138 instances of class java.util.jar.Attributes$Name
  131 instances of class java.util.HashMap
  131 instances of class [Ljava.util.HashMap$Entry;
  130 instances of class [Ljava.lang.Object;
  105 instances of class java.util.jar.Attributes
To get more information on the SimpleTimeZone instances, click on the link (the line beginning with 323). This will list all 323 references and calculate how much memory has been used. In this example, 21964 bytes have been used.
  Instances of java.util.SimpleTimeZone

  class java.util.SimpleTimeZone

  java.util.SimpleTimeZone@0x004f48c0 (68 bytes)
  java.util.SimpleTimeZone@0x003d5ad8 (68 bytes)
  java.util.SimpleTimeZone@0x004fae88 (68 bytes)
  .....
  Total of 323 instances occupying 21964 bytes.
If you click on one of these SimpleTimeZone instances, you see where this object was allocated.
  Object allocated from:

  java.util.TimeZoneData.<clinit>(()V) : 
        TimeZone.java line 1222
  java.util.TimeZone.getTimeZone((Ljava/lang/String;)
	Ljava/util/TimeZone;) : 
	TimeZone.java line (compiled method)
  java.util.TimeZone.getDefault(
        ()Ljava/util/TimeZone;) : 
	TimeZone.java line (compiled method)
  java.text.SimpleDateFormat.initialize(
        (Ljava/util/Locale;)V) : 
	SimpleDateFormat.java line (compiled method)
In this example the object was allocated from TimeZone.java. The source to this file is in the standard src.jar file, and on examining this file, you can see that indeed there are nearly 300 of these objects in memory.
  static SimpleTimeZone zones[] = {
   // The following data is current as of 1998.
   // Total Unix zones: 343
   // Total Java zones: 289
   // Not all Unix zones become Java zones due to 
   // duplication and overlap.
   //-------------------------------------------
   new SimpleTimeZone(-11*ONE_HOUR, 
                  "Pacific/Niue" /*NUT*/),

Unfortunately, you have no control over the memory used in this example because it is allocated when the program first requests a default timezone. However, this same technique can be applied to analyzing your own application where you may be able to make some improvements

Where the Application Spends its Time

Again, you can use the -Xrunhprof parameter to get information about the time the application spent processing a particular method.

You can use one of two CPU profiling options to achieve this. The first option is cpu=samples. This option reports the result of a sampling of the running threads of the Java VM to which a statistical count of the frequency of the occurrence of a particular method is used to find busy sections of the applications. The second option is cpu=times, which measures the time taken by individual methods and generates a sorted list ranked as a total percentage of the CPU time taken by the application.

By using the cpu=times option, you should see something similiar to this at the end of the output file

CPU TIME (ms) BEGIN (total = 11080) 
                       Fri Jan  8 16:40:59 1999
rank   self   accum   count  trace   method
 1   13.81%  13.81%       1   437   sun/
    awt/X11GraphicsEnvironment.initDisplay
 2    2.35%  16.16%       4   456   java/
    lang/ClassLoader$NativeLibrary.load
 3    0.99%  17.15%      46   401   java/
    lang/ClassLoader.findBootstrapClass
If you contrast this with the cpu=samples output, you see the difference between how often a method appears during the runtime of the application in the samples output compared to how long that method took in the times output.
CPU SAMPLES BEGIN (total = 14520) 
                   Sat Jan 09 17:14:47 1999
rank  self   accum   count  trace   method
 1    2.93%  2.93%   425    2532    sun/
    awt/windows/WGraphics.W32LockViewResources
 2    1.63%  4.56%   237     763    sun/
    awt/windows/WToolkit.eventLoop
 3    1.35%  5.91%   196    1347    java/
    text/DecimalFormat.<init>
The W32LockView method, which calls a native windows lock routine, is called 425 times. So when it is sampled it appears in the active runnings threads because it also takes time to complete. In contrast, the initDisplay method is only called once, but it is the method that takes the longest time to complete in real time.

Operating System Performance Tools

Sometimes the performance bottleneck occurs at the system or operating system level. This is because Java VM depends on many operating system libraries for functionality such as disk access or networking. However, what occurs in these libraries after the Java VM calls them is beyond the reach of most profiling tools for the Java platform.

Here is a list of tools you can use to analyze performance problems on some common operating systems.

Solaris Platform

System accounting reports, sar, reports the activity of the system in terms of disk IO, user program activity, and system level activity. If your application is using excessive amounts of memory, it may require disk swap space, which will show up as high percentages in the WIO column. User programs that get stuck in a busy loop show a high percentage in the user column:
developer$ sar 1 10

SunOS developer 5.6 Generic_105181-09 sun4u    
                                      02/05/99

11:20:29    %usr    %sys    %wio   %idle
11:20:30      30       6       9      55
11:20:31      27       0       3      70
11:20:32      25       1       1      73
11:20:33      25       1       0      74
11:20:34      27       0       1      72
The truss command traces and records the details of every system call called by the Java VM to the Solaris kernel. A common way to run truss is:
   truss -f -o /tmp/output -p <process id>
The -f parameter follows any child processes that are created, the -o parameter writes the output to the named file, and the -p parameter traces an already running program from its process ID. Alternately, you can replace -p <process id> with the Java VM, for example:
   truss -f -o /tmp/output java MyDaemon
The /tmp/output is used to store the truss output, which should look similiar to the following:
15573:  execve("/usr/local/java/jdk1.2/solaris/
                bin/java", 0xEFFFF2DC,
	        0xEFFFF2E8)  argc                   = 4
15573:  open("/dev/zero", O_RDONLY)                 = 3
15573:  mmap(0x00000000, 8192, 
             PROT_READ|PROT_WRITE|PROT_EXEC,
	     MAP_PRIVATE, 3, 0) = 0xEF7C0000
15573:  open("/home/calvin/java/native4/libsocket.so.1", 
              O_RDONLY) Err#2 ENOENT
15573:  open("/usr/lib/libsocket.so.1", 
              O_RDONLY)                             = 4
15573:  fstat(4, 0xEFFFEF6C)                        = 0
15573:  mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, 
              MAP_SHARED, 4, 0) = 0xEF7B00 00
15573:  mmap(0x00000000, 122880, PROT_READ|PROT_EXEC, 
              MAP_PRIVATE, 4, 0) = 0xEF7 80000
15573:  munmap(0xEF78E000, 57344)                   = 0
15573:  mmap(0xEF79C000, 5393, 
              PROT_READ|PROT_WRITE|PROT_EXEC,
              MAP_PRIVATE|MAP_FIXED, 4, 49152) 
              = 0xEF79C000
15573:  close(4)                                    = 0
In the truss output, look for files that fail when opened due to access problems, such as error ENOPERM, or a missing file error ENOENT. You can also track data read or written with the truss parameters -rall to log all data read or -wall to log all data written by the program. With these parameters, it is possible to anaylze data sent over a network or to a local disk.

Linux Platform

Linux has a trace command called strace. It traces systems calls to the underlying Linux kernel. This example traces the SpreadSheet example in the JDK demo directory.
$ strace -f -o /tmp/output 
                      java sun.applet.AppletViewer 
                      example1.html
$ cat /tmp/output

639   execve("/root/java/jdk117_v1at/java/
               jdk117_v1a/bin/java", ["java",
               "sun.applet.AppletViewer ", 
               "example1.html"], [/* 21 vars */]) = 0
639   brk(0)                              = 0x809355c
639   open("/etc/ld.so.preload", O_RDONLY)       = -1 
        ENOENT (No such file or directory)
639   open("/etc/ld.so.cache", O_RDONLY)          = 4
639   fstat(4, {st_mode=0, st_size=0, ...})       = 0
639   mmap(0, 14773, PROT_READ, MAP_PRIVATE, 
             4, 0) = 0x4000b000
639   close(4)                                    = 0
639   open("/lib/libtermcap.so.2", O_RDONLY)      = 4
639   mmap(0, 4096, PROT_READ, MAP_PRIVATE, 
             4, 0) = 0x4000f000
To obtain system information similar to the Solaris sar command, read the contents of the file /proc/stat. The format of this file is described in the proc man page. Look at the cpu line to get the user and system time.
   cpu  4827 4 1636 168329
In the above example, the cpu line indicates 48.27 seconds in user space, 0.04 at nice priority, 16.36 seconds processing system calls, and 168 seconds idle. This is a running total; individual entries for each process are available in /proc/<process_id>/stat.

Windows95/98/NT Platforms

There are no standard performance analysis tools included on this platform, but the following tools are available by way of freeware or shareware resources such as http://www.download.com .

Runtime memory analysis: Memory meter

Network analysis: Traceplus

[TOP]


[ This page was updated: 14-Oct-99 ]

Products & APIs | Developer Connection | Docs & Training | Online Support
Community Discussion | Industry News | Solutions Marketplace | Case Studies
Glossary - Applets - Tutorial - Employment - Business & Licensing - Java Store - Java in the Real World
FAQ | Feedback | Map | A-Z Index
For more information on Java technology
and other software from Sun Microsystems, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.
Sun Microsystems, Inc.
Copyright © 1995-99 Sun Microsystems, Inc.
All Rights Reserved. Legal Terms. Privacy Policy.