Tuesday, August 30, 2005

Use PTDV to Optimize as400 Application Performance

Use PTDV to Optimize as400 Application Performance
by David Bridgewater


I want to share with you a recent experience: how the right performance tracing tools and techniques brought about a transformation. The tale involves:

a JavaServer Pages (JSP) application (running under WebSphere Application Server (WAS) on the iSeries server)
IBM's Performance Trace Data Visualizer (PTDV), a freely available performance investigation tool for the iSeries
I subjected the application to several performance tests and found that with a moderate number of users, the iSeries CPU was saturated. I had hoped for three or four times the capacity. A quick scan of our application code suggested many possibilities for the problem's cause: hundreds of hefty classes, more strings than a piano factory, complex business routines by the score. Tackling any one of these — the string proliferation, for example — would require substantial intervention through most of the application.

I sought expert advice at this point from IBM UK, which provided Bruce Wassell, a man as wise in the ways of the iSeries as of WebSphere. He showed me how to use PTDV to identify the bottleneck in our code. It turned out that the string proliferation theory was wide of the mark, and a one-hour program fix restored the capacity I had hoped for. The trick was to find the part of the program that needed tuning, and in the three steps in this article, I show you how to use PTDV to do that.

You can download PTDV from http://www.alphaworks.ibm.com/tech/ptdv. PTDV is currently freely available. IBM's policy for the alphaWorks site, however, doesn't guarantee that technologies such as this stay free forever. The alphaWorks Web site states that alphaWorks products are available for download "at the earliest stages of development — before they are licensed or integrated into products — allowing users to evaluate and influence IBM research and development." Some alphaWorks products become open-source products, and some become for-profit products. Although PTDV is an "alpha" product, it has been around a while, at least since the middle of 1999. I found the tool easy to install and fairly robust, probably because of this maturity.

You don't have to install PTDV until Step 3 of this article, though you might want to install it first, so that you can read PTDV's excellent HTML tutorial in conjunction with this article.

Step 1: Find the Hottest Spot
In this step, you're looking for the "hottest spot" in your application, that is, the transaction that yields the biggest figure. I found it best not to second-guess this (and I explain why in Step 2), even though the biggest CPU hitter in my application was far from being the slowest for user response. Clearly, many other factors account for user response in a typical application, such as reading the database, which consumes time but not necessarily CPU. (The CPU time on SQL database access is mostly consumed by another job on the iSeries, outside of the WebSphere subsystem and this article's scope.)

To begin this step, work as your application's sole user. Nobody else should use the application server under WebSphere in any way. Other activity on the iSeries is acceptable, even other application servers running. Perform a typical test script, and for each button-press that causes a page reload, measure the number of CPU milliseconds taken.

On the iSeries, each application server under WebSphere runs as a separate job (i.e., process) within a subsystem named QEJBSBS (note that I'm working with WAS 3.5.5; if you have WAS 4.x, look for QEJBADV4 or QEJBAES4). The WRKACTJOB (Work with Active Jobs) command lets you track down the jobs within QEJBSBS. Scroll to the job whose name matches that of the application server defined in WebSphere. You can set the WRKACTJOB display to show the CPU milliseconds the job takes (press F11 for the relevant view, which displays elapsed data, as Figure 1 shows). [Editor's Note: In various figures of this article, some text irrelevant to the figure's point has been deliberately obscured to protect the author's privacy.]

Here's the process for finding the biggest CPU hitter. Before you start measuring, run all the application parts that you intend to measure. "Second time in" is always better (unless you're specifically interested in a "first time in" performance problem), to ensure that classes have loaded, JSPs have compiled, and singleton classes have been created. These are then eliminated from the CPU calculations.

Next, for each transaction you want to measure:

Refresh the WRKACTJOB view.
Note the number of milliseconds consumed so far by the job (X).
Make your application cause activity on the server (e.g., press the button on the page, or whatever).
When activity ceases (e.g., the page is completely reloaded in the browser), refresh the WRKACTJOB view.
Note the number of milliseconds consumed so far (Y).
Subtract X from Y to arrive at the milliseconds the job consumed.
Step 2: Trace the Transaction
Now that you've identified the hot spot, collect detailed performance trace information for this and nothing else. Why? You run the risk of being overwhelmed with data if you collect performance trace information for more than just the hot spot. In my experience, a traced second or two of CPU time gives as much data as PTDV can handle. Doubtless, PTDV's capacity varies with the size of the iSeries at your disposal, but in any case, as long as you've monitored the correct second, you'll find the knowledge you seek.

Note that to perform the trace, you must be running and loading optimized Java code (see the sidebar "Loading Optimized Java Classes on the iSeries"). On the iSeries, you use the Performance Explorer (PEX) toolset, delivered by default on any iSeries, to perform traces. PTDV is separate from PEX but relies on the PEX-built database. So you need to know just enough about PEX to make PTDV work.

The PTDV tutorial lists the following steps, which I've salted with my observations, for performing a trace:

Identify the job that's running the application server (we covered this in Step 1 of this article).
Display the job to note its defining characteristics (i.e., name, user, and job number).
Run the ADDPEXDFN (Add PEX Definition) command as follows:
ADDPEXDFN DFN(MY_DEFN) TYPE(*TRACE) JOB(123456/QEJB/MYAPPSVR)
TASK(*ALL) MAXSTG(500000) TRCTYPE(*SLTEVT) SLTEVT(*YES)
BASEVT(*PMCO) PGMEVT(*JVAENTRY *JVAEXIT) TEXT('PTDV Article')
I avoid the PTDV tutorial recommendation to run JVAEVT(*OBJCRT), because it swamps me with too much information. PTDV still reports on numbers and sizes of objects created, even if you don't set this parameter.
Have two sessions lined up — one ready to start PEX, and the other to end it. The relevant commands are STRPEX (Start PEX) and ENDPEX (End PEX):


To start:

STRPEX SSNID(MYSESSION) DFN(MY_DEFN)
To end:

ENDPEX SSNID(MYSESSION) RPLDTA(*YES) DTALIB(MYLIB) TEXT('My
Performance Data Traced for PTDV')
Have your browser poised just before the point where you'll press the button to measure your "big hitting" transaction. (Recall the "second time in" advice; you're likely better off running the transaction once before doing the trace.)
Then, in rapid succession:
Execute the STRPEX command.
Execute the transaction (press the button in your application).
Execute the ENDPEX command. You can do this even before the transaction has completed. My notional threshold is not more than 2 CPU seconds on my small AS/400 (which, in the example for this article, generated more than one million Java "events"). Executing ENDPEX before the transaction has completed is fine if the transaction is processing many items in a loop, and you're only interested in analyzing one iteration within the loop.
ENDPEX can take a while, but it gives a "percentage complete" message as performance data is loaded.

Step 3: Visualize the Problem
Now the proper fun begins: using PTDV to visualize your problem. Here are the steps (the PTDV tutorial covers in more depth some of the areas I discuss next):

Install the PTDV server.
Install the client.
Run STRPTDVSVR (Start PTDV Server) to start the server; I prefer to start in batch.
Start the client (I much prefer a thin client). You need Java 1.3 to run the client user interface. You need the name of the session (MYSESSION) from Step 2, and also the name of the iSeries database library where ENDPEX placed the performance data (MYLIB). Figure 2 shows the window where you enter this data.
I observed a big CPU impact on the iSeries running the data load. I found myself working at non-peak times so I wouldn't interfere with other activities.
In the Trace Information window, which Figure 3 shows, click the Job/Thread List tab. The correct application server job is easy to find because it's the only one you traced. Click the expand (i.e., circle-and-line) icon once to expand a list of threads under the job. The threads are uniquely identified with hex numbers. Unfortunately, no descriptive text is present to explain in plain English what each thread does (most have to do with the internals of WebSphere). However, it's easy to identify the thread that burns the most CPU, and chances are that this is your application code. The background-color bars offer simple but effective graphical clues. Look at the CPU Time (µs) column and find the widest bar. The bars give a proportional indication, as you can see in Figure 3, which shows that I had three significant threads to choose from. I had no problem spotting that one of these threads accounted for more than half the time spent.
Double-click the document icon to the left of the row representing the thread identified in the preceding step to launch the Information for thread with task count ID xxxx window, which Figure 4 shows. Despite the window's uninviting title, it's where the action begins, which for me, was at the doPost() method of the top-level servlet in my application. Click the expand icons. You see a real-life sequence diagram unfolding before your eyes — the actual passage of messages between instantiated objects. Head always for the widest bars. In the first stages of exploration, you'll find cumulative CPU concentrated in the highest-level functions. However, sooner or later, as you expand further into the code, clues will begin to appear. Figure 4 shows how the CPU consumed by top-level function LineModellingDelegate comprises mostly three calls to the same function: createTableBean on the TableBeanFactory.
Figure 4 also shows a classic performance issue: a Trace.log routine. Although I turned off tracing for the performance test, and Trace.log exits as soon as called, an unnecessary string is still created as a parameter to the function. However, its significance is marginal compared to the createTableBean calls; trust the tool, not your instincts!
Opening up the first of the three calls, I hit pay dirt: a call to the XMLParser.parseXMLFile routine. As I expand this, I find the 291,952 CPU microseconds expended distributed across many calls to the parseElements function, which Figure 5 shows.
Fixing the Problem
The function names give a strong hint that an XML file is being processed to create an object called a TableBean. Here, some inside knowledge of the application helps. I happen to know that the TableBean exists to render an HTML table. This bean relies on information held in a TableDefinitions.xml file, which tells the Java code how to apply cell formatting to any particular piece of data in any particular table type.

Within parseXMLFile I find the following code:

try {
DOMParser myParser = new DOMParser();
myParser.parse(xmlFile);
doc = myParser.getDocument();
parseElements(doc.getDocumentElement(), new String(""));
// More stuff that parses elements…
} etc.
XML parsing — especially when you use the Document Object Model (DOM) approach — is an expensive operation because it involves scanning an entire XML document into memory. And here, this operation sat directly on the critical path of a user-critical transaction! For every separate user and Web page type, the code was accessed.

The information in this XML file is static and changes only during new releases of our application software. The cleanest solution was simple: Parse the file only once and place the resulting data in Java objects, thus caching it in memory. In fact, for minimal code rewriting, I decided to parse the file when each new table type was first requested.

You could call this a problem occasioned by encapsulation. A developer (rightly) used the process of acquiring a TableBean without needing to investigate how the process worked. But because the inefficient routine was well encapsulated, it was easy to replace with something much more efficient.

The original code — as PTDV traced it — roughly followed the sequence diagram that Figure 6 shows. After the fix, the sequence diagram looked more like what you see in Figure 7. I wrote a simplified version of the code fix to show the principle involved — caching the XML data in an object. If you download and run the code, which Figure 8 shows, you should see the following output:

Getting table type 1…
Doing performance-expensive stuff to parse XML file...
Got table type 1!

Getting table type 2
Doing performance-expensive stuff to parse XML file...
Got table type 2!

Getting table type 1 again…
Got table type 1 again!
The point is that the expensive parseXMLFile routine isn't reinvoked when getting table type 1 for the second time. Table type 1 is already available in the object cache.

Some refinements I could make to the approach include:

Create all necessary TableBeans with a single pass over the underlying XML file.
Move all TableBean creation into the init() code of a servlet, so that the performance hit would be taken entirely by the application server startup and not by the first user to hit each page type after an application server restart.
Adopt the Simple API for XML (SAX) model of XML parsing (which doesn't require the whole document to be loaded into memory).
Sidestep the whole issue by acquiring a persistence framework, which can automate the issues associated with caching data in objects.
Get to the Heart of It
When you're aware of PTDV's pitfalls (i.e., data overwhelm), PTDV is easy to use and interpret. I touched on only one of the performance analysis methods PTDV offers. Other methods, for example, include an Object Information tab, which shows the numbers of each object created by your application, as well as their total and average sizes. You simply click column headings to see this information in descending sequence and quickly reveal "space hogging" objects.

If you have a performance problem with your iSeries Java application, try PTDV. Although I can't promise the spectacular improvements it gave my company's application, PTDV will help you get to the heart of the problem.

David Bridgewater is a technology manager at the Arcadia Group, PLC, one of the UK's largest fashion retailers. You can reach him at David.Bridgewater@arcadiagroup.co.uk. You can reach Bruce Wassell, the IBM consultant who led David to the performance revelations described in this article, at Bruce_Wassell@uk.ibm.com.

Loading Optimized Java Classes on the iSeries
On the iSeries, loading optimized Java classes requires running CRTJVAPGM (Create Java Program) on the .jar files or .class files that constitute your application:

CRTJVAPGM CLSF('/myconvolutedpackagename/MyAppJarFile.jar')
OPTIMIZE(40) ENBPFRCOL(*ENTRYEXIT)
Within the WebSphere console (note that I'm running WAS 3.5.5), you must ensure that the optimized classes are loaded, and that the just-in-time compiler doesn't intervene. You do this by specifying the classpath directly on the application server's command line arguments (do this on the General tab), like this, for example:

-Xms64m -classpath
/QIBM/UserData/…/MyAppJarFile.jar:/QIBM/ProdData/…/rt400.jar
I also ensure that the classpath is removed from the Web application (do this on the Advanced tab), just in case a conflict of interest exists.

Restart the application server after performing these steps.

— D.B.

More as400 performance tips to follow.

No comments:

Post a Comment