Why Test? (and How)

This document shows the scalability and performance test methodology developed at PushToTest to identify and quantify the business optimization benefits. This is the same test method used at General Motors, BEA, Lockheed Martin, Sun Microsystems and the European Union. The methodology makes apparent the tradeoffs a software developer makes when choosing coding techniques, code libraries, and APIs.

The PushToTest Method to Identify Service Performance Metrics

It frequently surprises us how few enterprises, institutions, and organizations have a method to test services for scalability and performance. One fortune 50 company asked a summer intern (who they ended up hiring) to run a few performance tests between other assignments to check and identify scalability problems in their SOA application. That was their entire approach to scalability and performance testing.

The business value of running scalability and performance tests becomes clear after a business formalizes a test method that includes the following:
  1. Choose the right set of test cases. For instance, the test of a multiple-interface and high volume service will be different than a service that handles periodic requests with huge message sizes. The test needs to be oriented to address the end-user goals in using the service and deliver actionable knowledge.

  2. Accurate test runs. Understanding the scalability and performance of a service requires dozens to hundreds of test case runs. Ad-hoc recording of test results is unsatisfactory. Test automation tools are plentiful and often free.

  3. Make the right conclusions when analyzing the results. Understanding the scalability and performance of a service requires understanding how the throughput – measured as Transactions Per Second (TPS) at the service consumer – changes with increased message size and complexity, and increased concurrent requests.
All of this requires much more than an ad-hoc approach to reach useful and actionable knowledge.

This section teaches the PushToTest methodology for understanding the scalability and performance of SOA in multiple environments and configurations. You will learn how to identify the use cases, test cases, and test scenarios to understand the scalability and performance of your SOA. And you will learn how to analyze the results data.

The PushToTest methodology is available to you in a set of developer scalability and performance kits. These kits are either available for free downloading under an open-source license or available as a commercially licensed product. Details on the available kits are available later on this page.

An IT Industry Supporting the PushToTest Methodology

PushToTest is a software publishing and services company I founded in 2001. Enterprise information technology managers were in a bind. They already used-up their capital budgets through the 1990's buying huge volumes of equipment and building huge datacenter capacity. Now they needed to increase productivity of their existing information systems without engaging huge integration projects like those of the past. Against this backdrop I founded PushToTest as a test automation solutions and enterprise services business with three goals:
  1. Use open-source distribution and development techniques to build a test tool. I used on-line community development techniques to build an audience to sell services and product license up-sells.

  2. Conduct scalability and performance studies of information systems and development tools and libraries for software tools vendors and enterprises using the tools.

  3. Convince the software development tool vendors and enterprise users it is in their best interest to release performance and scalability testing results, and the software developed of the studies they commissioned, to the software development community as a "kit."

PushToTest developed a community of approximately 140,000 software developers, quality assurance technicians, and IT managers who use the open-source TestMaker framework and utility to build automated Web Service and SOA tests. The scalability and performance kits use TestMaker and garnered interest from the software development community and CIOs. The kits deliver immediately usable reference software code to developers and best practices and a Total Cost of Ownership (TCO) analysis to business managers. PushToTest became the company that tools vendors like BEA, Sun, and IBM turn to for independent validation of their competitive standing, and enterprises turn to PushToTest for independent validation of the tools vendors claims.

Many tool vendors released the resulting software to the software developer community. Use Google.com to search for "scalability and performance kit" to find these. Here are URLs to the publicly available kits:

The most recent kit implements the FastSOA pattern using a variety of native XML and relational tools. This kit is available under a commercial license in the Raining Data FastSOA Performance Kit.

The PushToTest methodology implemented in these kits follows a user goal oriented testing (UGOT) philosophy to determine the scalability, performance, and reliability of a service and application software. The following sections describe this in detail.

User Goal Oriented Testing (UGOT)

In my previous book Java Testing and Design  I introduced the User Goal Oriented Testing (UGOT) method. UGOT contrasts user goals with what a service (or application) actually delivers. I developed the idea for UGOT testing after hearing Alan Cooper  describe his techniques for user goal oriented software interface design. Alan argues that software developers should design user interfaces against the needs of a single archetypal user. I enjoy watching Alan spar with developers on this issue. Most developers argue with Alan that they should be designing their interfaces for all possible users. Alan counters by saying "If you design for every possible user, no individual user will have their goals met when they use your software application!"

The same controversy exists when applying Alan's techniques to testing. Most software developers I have met are predisposed to want maximum coverage of all features when testing new software. When this happens I point out that coverage tests are usually pointless. Users always take a path through the functions in a service. They never use every feature. Instead, they use a chain of features – one after another.

The agile development community approaches this problem by recommending a test first strategy. Test first urges developers to write a unit test of a class before writing the class itself. At build-time the compiler environment compiles the object code for the class and then runs the unit test against the compiled code. The unit test completes successfully by receiving example data and validates the response. When the class returns an invalid response, the unit test throws an exception that the build and deploy environment handles.

Unit testing and agile development methods help but are not a complete solution to UGOT techniques. For instance, test first is usually only carried out at a unit level. SOA deploys applications as a collection of services, so testing individual units misses most of the big problems that occur during SOA integration and deployment. UGOT modeled tests check a service as an individual user would – by picking one feature after the next in a chain of service requests.

UGOT is ideal to understand SOA performance and scalability testing. UGOT treats ad-hoc testing of software as the slippery slope to madness. In all that we do to understand performance and scalability in SOA every step must deliver real value to the decision makers who build and operate services.

Convincing your organization to use UGOT requires a little persistence and then a whole lot of explanation. What software architect, developer, and IT executive will refuse the results of a test that would immediately benefit the business or institution? First ask the question:

What do I need to learn from a performance test and how will it benefit my company, business, or institution?

This question gets to the heart of why we test SOA at all, and it may be more difficult to answer than it first appears. In some of the test projects I have run, the answer took longer to find than running the actual test and understanding the results.

The question challenges us to understand what we are actually testing. For instance, Figure 1 shows many of the building blocks found in Java development tools for building SOA.


Figure 1 - The possible architectural components of SOA designs that could be tested for performance and scalability.

The components build on each other in three tiers: At the bottom are the fundamental components for SOAP bindings, XML parsing, Java inter-application messaging services (JMS), and clustering. Building on these are service bus components for services to interoperate at a message level. The top tier provides interoperability at an application level. Given these building blocks and tiers, where would you start testing to understand the scalability and performance of an SOA implementation?

For instance, does it make sense to test only the connectors and caching objects at the service bus level exclusively? If you do, you may miss key performance bottlenecks in the SOAP Bindings and JMS service. Each of the components in Figure 1 impacts the scalability and performance of a SOA implementation.

Another way to understand your testing goal is to review the definition of service architecture as shown in Figure 2.

Figure 2 - SOA is a consumer, service, and broker architecture.

Which of these do you test first and when? Performance tests normally check at least two. For instance, one test may check a consumer and a service, and another checks a consumer and a broker. Table 1 lists options to understand what part of your system would benefit most from scalability and performance analyses.

Table 1 - Understanding Your Test
Test Name (what you want to test) Test Benefit Test Type Parameters related to the scalability index
Service Interface Decrease time for service request responses to lower network bandwidth and server hardware costs. Stateless Message size and concurrent requests levels
XML Parsing Decrease time for routing service messages to lower network bandwidth and server hardware costs. Stateless Schema complexity (depth and element count), document size, concurrent requests level
Data Persistence Decrease time for storing and retrieving messages to lower network bandwidth, server hardware, and disk costs. Stateful Schema complexity (depth and element count), document size, concurrent requests level
Data Transformation Decrease time for transforming a message into a given XML schema to lower network bandwidth and server hardware costs. Stateless Source and destination schema complexity (depth and element count), request and output document size, concurrent requests level
Data Aggregation and Federation Decrease time for responding to service requests requiring up-stream data to reduce network bandwidth and server hardware costs. Stateful Schema complexity (depth and element count) for up-stream services, data persistence quantities, message time-to-live (TTL) values.
Data Mitigation Reduce time when a service is unavailable at peak usage to improve service availability and user satisfaction. Stateful Schema complexity (depth and element count) for each request, document size, concurrent requests levels

This table covers the test goals we encounter most often, however there are many more possibilities and the pace of innovation within SOA and Web service building tools is fast.

Test Name and Test Benefit are self-explanatory, but Test Type and Parameters-related-to-the-scalability-index need an explanation:
We show the scalability index, stateful and stateless testing, and the impact of changes in the test parameters in more depth later in this chapter.

First, let us explain the steps implementing the PushToTest method.

Table 2 - The PushToTest Method
Phase Goal Checkpoint
Planning Answer the question: How will this test benefit my organization? Write Test Plan document.
Definition Identify use cases, test cases and scenario, and test environment (hardware, software, network) Add the use cases, test cases and test scenario to the Test Plan. Achieve management sign-off.
Calibration Test Calibrate the test cases to the test environment. Identify the use cases driving the test environment to its maximum throughput (as measured in TPS from the client).
Optimize Modify the service and/or test environment to optimize for best performance based on what you learned in the Calibration Test. Amend the test plan to add the optimization changes.
Full Test Run the test scenario. Successful run of test scenario.
Results Analysis Identify test result metrics and trends against test scenario goals. Present results and achieve adoption by management.

Here is a brief explanation of the terminology used in Table 2:

As with any test methodology, the devil is in the details, and the next section provides a detailed look at the method in practice. Before we dig in, however, we will cover an important distinction between this SOA test method and everyday software testing.

Method for Black Box and White Box (Profiling) Tests

Testing SOA for scalability and performance is different from testing software applications and code. SOA testing is focused on understanding how a service responds to increasing levels of concurrent requests, message sizes, and response handling techniques. The nature of SOA testing is black box testing; it doesn't matter what happens inside the box.

Code profilers have their place in testing software and software developers often rely on them to learn the location of performance problems. However, in our experience, black-box testing often yields more actionable knowledge and here is what we recommend:
  1. Create a baseline performance metric (a Scalability Index) using black-box performance tests showing Transaction Per Second (TPS) results, measured at the service consumer, with a variety of message sizes, message schema complexities, and concurrent requests levels.

  2. When comparing performance and scalability between multiple servers, consumers, or brokers it is important to identify each server's Performance Index and normalize the test parameters to avoid reporting false slow-performance results. This step is called a Calibration test because you are calibrating the test lab to run the tests properly.

  3. Once you determine the Performance Index of the service under test then use white box techniques to profile the largest time expensive object operations to handle requests. Optimize the software based on the profile.

  4. Continue optimizing the service by repeating steps 2 and 3.

  5. Run the Performance Index and analyze the results.
The preceding sections have laid the groundwork for understanding our method of testing services for scalability and performance. Now we show the test method applied to a real-world SOA scenario

Applying the Method to SOA and Web Services

Having covered the goals and means to test services, there is nothing quite like a good example. This section shows the effort involved in developing the Raining Data FastSOA Performance Kit.

The Raining Data FastSOA Performance Kit (referred to as "the kit") highlights the scalability, performance, and developer productivity differences between SOA services built with Java application server and database tools, and the same services built with native XML technology. The kit looks at SOA from two perspectives:
  1. SOAP binding acceleration. Implements the FastSOA architecture using Java objects and XQuery technology. The use cases contrast performance and developer productivity based on the typical developer choices of XML parsing techniques (XML binding compiler, Streaming XML parser, and DOM approaches for Java and XQuery parsing).

  2. Mid-tier caching for service acceleration. Implements a use case with native XML databases (XML DB) and relational databases (RDBMS) to contrast database performance across a variety of XML message sizes and database operations (insert, update, delete, and query).
The following sections explain the test's background and goals to illustrate using the PushToTest method in the kit.

Planning: Background and Goals

Software architects and developers choose XML parsing techniques, service libraries, encoding techniques, and protocols when building services using SOA techniques. Each choice has an impact on the scalability and performance of the finished service. This kit has three goals:
  1. Explain the changing landscape of APIs, libraries, encoding techniques and protocols to software architects and developers. The current generation of technology choices change approximately every 6 to 9 months. For instance, JAXB 1.0 is replaced by JAXB 2.0 and WebLogic Server 8.1 is replaced by WebLogic Server 9.

  2. Identify and use real-world, use-case scenarios showing software architects and developers how to choose technology based on their service goals. These scenarios result from testing Web Services for General Motors and learning how developers, at the Silicon Valley-based Software Development Forum, apply XML parsers and service interface toolkits.

  3. Deliver code compatible with the current techniques for building functional and scalability tests (black-box, unit, agile test-first). We have seen many vendor tutorials that were not compatible with what we used to build our software. They were useful, but often required adoption of proprietary vendor tools or use of internal vendor secret magic. To build the kit, we resolved to use the same techniques used to develop our open-source project, and only follow public information available on the Web.
The kit delivers a reusable method for evaluating SOA performance and system scalability, plus the results feed basic business needs of cost/benefit and feature/function analysis including:
The kit arms business managers and software developers with the evidence needed to recommend and adopt FastSOA solutions internally, and get their projects funded. The contents of the published kit are listed in Table 3.

Table 3 - Scalability and Performance Kit Contents
Content Description
Source code Complete source code for each use case and test scenario including Ant build scripts to build the kit in your own environment.
Developers Journal A Developer's Journal describing in detail:
  • Detailed use cases and test scenarios
  • Design decisions and trade-offs
  • XML and Java binding implementation stories
  • Client-side software calling the implemented services
  • Server-side software implementing the services
  • Use case scenario specific findings
  • Installation and performance tuning
Prebuilt JARs (ready for you to press a Start button and watch the results) Pre-built JAR and WAR files for immediate use in your environment.
TestMaker and TestScenario Scripts Scripts to stage a scalability and performance test of each use case, and the test scenario.


Definitions: Use Case and Test Scenario

The kit measures SOAP binding performance and scalability of bindings created and deployed using J2EE-based tools and XQuery and native XML database. Performance testing compares several methods to receive a SOAP-based Web Service request and respond to it. Scalability testing looks at the operation of a service as the number of concurrent requests increases. Performance and scalability tests measure throughput as TPS at the service consumer.

The use cases and test scenarios contrast the TPS differences between the most popular approaches to parse XML in a request. We made this choice based on the following personal experiences:
  1. Despite Web Services standards being more than five years old, a standard service test method has not emerged. For instance, the SPECjAppServer1 test implements a 4-tier Web browser based application where a browser connects to Web, application, and database servers in series. SOA, on the other hand, is truly a multi-tier architecture where each tier can make multiple SOAP requests to multiple services and data sources. SPECjAppServer and similar 4-tier tests do not provide reliable SOA application information needed by capacity planners and software architects.

  2. Software architects and developers are specializing their talents by service type. For instance, one developer works with complicated XML schemas in order processing services while another concentrates on building content management and publication services in portals.

  3. The tools, technologies, and libraries available for software architects are changing rapidly. For instance, a survey we conducted in 2005 shows all Java-based XML parsing libraries will change significantly within the next year.
Responding to these issues, the kit has use cases common to many SOA environments. These use cases highlight different aspects of SOA creation and present different challenges to the software development tools examined.
  1. Compiled XML binding using BOD schemas. In this scenario, codenamed the TV Dinner, a developer needs to code a part ordering service. The service uses Software Technology in Automotive Retailing (STAR) Business Object Document (BOD) schemas.

    On the consumer side, the test code instantiates a previously serialized Get Purchase Order (GPO) request document and adds a predetermined number of part elements to the ordered part. On the service side, the service examines only specific elements within the GPO instead of looking through the entire document.

    The developer's code addresses compartments by their namespace so they add/put only the changing parts of the purchase order. The other compartments (company name, shipping information, etc.) don't change from one GPO request to another. To accomplish this, the TV Dinner uses JAXB created bindings allowing access to the individual compartments. This XML to object binding framework is used so only the required objects are instantiated.

    The TV Dinner scenario is named because in a TV dinner, the entire dinner is delivered at once and the food is in compartments.

  2. Streaming XML (StAX) Parser. In this scenario, codenamed the Sushi Boats, a developer builds a portal receiving a "blog" style news-stream. Each request includes a set of elements containing blog entries. The test code scenario parameters determine the number of blog entries included in each request. The developer needs to take action on the entries of interest and ignore the others. The test code for the Sushi Boats features the JSR 173 Streaming XML (StAX) parser.

    The Sushi Boats scenario is named from observations at a Japanese Sushi Bar where the food passes by in a stream and the diner selects the boat they take food from.

  3. DOM approach. In this scenario, codenamed the Buffet, a developer writes an order validation service receiving order requests and must read all the elements in a request to determine its response. The test code scenario parameters determine the number of elements inserted into each request. The test code for The Buffet scenario uses Xerces DOM APIs.

    The Buffet scenario is named from experience eating at a buffet restaurant and feeling compelled to visit all the stations.
In addition to these use cases, the kit contrasts database performance differences between native XML databases and relational databases, and storing XML data containing complex schemas and multiple message sizes in the mid-tier.

The kit implements these use cases using both Java and XQuery tools.

Additional Use Cases Considered but Not Implemented

During test case planning, it is very easy to add more and more use cases resulting in a test taking months to run! Limiting the number of use cases means some test cases are delayed. The following use cases are not in the current kit (due to time constraints), but will appear in a future version:
  1. Creating XML structures from relational data. This is a very common scenario because relational databases are widely used. Also, recombining data to create different structures is often very important.
  2. Additional ways to return XML data. For instance, returning entire XML documents and extracting portions of very large XML documents.
  3. Joins between nodes of large XML documents.
  4. Returning large or complex XML structures from direct extraction or data processing operations.
  5. Full-text queries for portions of XML structures.

Defining the Test Scenario

The Test Scenario is the aggregate of all use and test cases. For instance, the kit implements several use cases showing different approaches to XML parsing (DOM, XQuery, StAX, and Binding Compiler). So if we wanted to run four use cases with two message sizes, we would have eight test cases in our test scenario.

Table 4 lists the four use cases: 2 technology choices, 3 message payload sizes, and 4 concurrent requests levels.

Table 4 - The Test Scenario
Use Case
Technology Choice
Request Payload Size
Concurrent Requests
Java XML Binding Compiler
XQuery Engine
5,000 bytes
5
Java Streaming XML Parser X Java Application Server
X 100,000 bytes X 50
Java DOM


500,000 bytes
100
XQuery Streaming XML Parser




200

The test scenario is the aggregate of all the test cases. For instance, one test case uses the XML Binding Compiler running on TigerLogic at 100,000 bytes and 50 concurrent requests. With the given parameters, this test scenario requires 96 test cases to run.

If each test case has a 5-minute warm-up period, takes 5 minutes to run, and has a 5-minute cool-down period, the test scenario requires 1,440 minutes (24 hours) to run. Seeing how run time can increase significantly as the number of use cases increases, use caution when adding use cases to your test scenario; you should be confident the extra run time delivers actionable knowledge.

Identify the Test Environment (Hardware, Software)

The last part of test definition concerns the test environment itself. The goal is to follow commonly used, well known, and published best-practices. Here are our choices for the test environment:
At this point, the test is well enough defined to begin coding the use cases. Now we can end the definition phase, build the test environment, install the test, and learn the levels where the server tops-out.

Using the XSTest-Pattern for Performance Tests

When testing SOA for scalability and performance, the shear number of test cases in the test scenario makes it necessary to use test automation. One approach to test automation is a pattern, we call XSTest, and it is a feature of TestMaker. XSTest takes a test sequence – such as the scenario in Table 4 – as input, stages each test case in sequence, and records the transaction results to an XML-based log file. The XSTest implementation in TestMaker then tallies the results from the log file into a TPS report.

Figure 3 illustrates the XSTest pattern as a UML sequence diagram:

Figure 3 - XSTest sequence diagram showing how a test scenario is run.

One of the key advantages to the XSTest pattern is its use of jUnit TestCase objects. These are familiar to most developers and also easily learned. The kit implements the tests as TestCase objects for use in the load test, and for reuse as functional tests.

Calibration Testing

When defining the test scenario a certain amount of speculation is baked-into the test plan. For instance, the Table 4 test scenario speculates a test case can achieve satisfactory throughput (in TPS) with a message payload request size of 500,000 bytes and 200 concurrent requests.

A Calibration Test identifies a service agent's optimum throughput – measured in TPS at the consumer – against the given test hardware and software. For instance, consider the data in Table 5.

Table 5 - Payload Size, Concurrent Agents, and Transactions-Per-Second Results
Payload Size (Bytes) Concurrent Agents Transactions Per Second (TPS)
1000 10 10.376
2000 10 8.667
3000 10 6.174
4000 10 1.383
5000 10 0.731

Table 5 lists the two input values for the test: the message size sent to the service, and the number of concurrent test agents. Using these values, XSTest operates a test case by instantiating one thread for each concurrent request. Each thread dynamically generates the defined payload size data and sends it to the service as a request. Then the thread receives the server's response, validates the response, handles any exceptions, and logs the response as a completed transaction. The thread repeats the same steps until the test case period is finished.

Figure 4 displays the results from Table 5 in a bar chart clearly shows the maximum throughout values.



Figure 4 - A bar chart of the results from Table 5 showing maximum throughput values.

Table 5 and Figure 5 provide good information about the service under test, including:
  1. As payload increases, TPS reduces proportionately. The test is not saturating or underutilizing the server, network, or consumer. If TPS increased, the testing level was not high enough, or if it was flat or dropping sharply, the testing level was not low enough (this is explained in more detail later in this section).

  2. The reduction in TPS is not proportional to the increase in request size. When the network and consumer are not at high enough activity levels indicates the service has a poor performing request processor. One reason this could happen is if the message parsing system is not allocating resources (memory, network socket connections, message queues) of the correct size for the demands of the test.

  3. TPS takes a significantly larger reduction for test cases above 3000 bytes of payload. In this case, we normally use a code profiler to find a test experiencing a buffer overflow or an undersized object list.
While Figure 4 tells us a few things about the service, there is not enough information to make a conclusion. At this point, we have more questions than conclusions! What we need now are values for the test parameters, listed in Table 6, to help determine the problem.

Table 6 - Parameter Values Required to Calibrate a Test
Parameter Description
Request Payload Size Service request message-body size (in bytes)
Response Payload Size Service response message-body size (in bytes)
Concurrent Requests Total number of concurrent requests
Transactions Per Second (TPS) Ratio of total completed responses to execution time (in seconds)
Network Utilization % network bandwidth (measured from server)
Server CPU Utilization % server processor bandwidth
Consumer CPU Utilization % consumer/client processor bandwidth
Average Transaction Time Average service response time (measured by consumer/client)
Minimum Transaction Time Minimum service response time (measured by consumer/client)
Maximum Transaction Time Maximum service response time (measured by consumer/client)
    
In a stateless system, for each request, a service allocates its own memory, CPU bandwidth, network bandwidth, and other resources needed to generate a response. For a stateless calibration test, we are looking for resource bottlenecks. Table 7 shows the test scenario results including network utilization, and server and consumer CPU utilization values.

Table 7 - Network and CPU Utilization
Payload Size (bytes) Concurrent Requests Transactions Per Second - TPS Network Utilization Server CPU Utilization Consumer CPU Utilization
1000 10 10.376 1.24% 55% 34%
2000 10 8.667 1.14% 78% 37%
3000 10 6.174 1.32% 89% 31%
4000 10 1.383 0.45% 95% 21%
5000 10 0.731 0.28% 96% 18%

Aha! The results in Table 7 give some idea of what is going on during the test scenario:
  1. The test is server-bound preventing greater throughput (TPS). When payload sizes are less than 4000 bytes, server CPU utilization is high but not saturated. At 4000 bytes and greater, the CPU is saturated.

    Stateless tests require resources to handle the concurrent requests load. Take away a resource – CPU bandwidth or free memory – to operate on larger payloads and response times increase lowering overall TPS.

  2. The scale of the problem indicates there is a significant problem in the server. The payload size from 1000 to 5000 increases by a factor of 5, but TPS values decrease by a factor of 14, from 10.376 to 0.731. In a stateless test, the TPS value should be proportional to the input.
By the way, since this is a stateless test, each request should be served from an independent group of resources (threads, memory, etc.). Watching CPU and memory utilization levels is an appropriate way to identify scalability and performance thresholds.

However, this is not the case for stateful services such as database and workflow applications. Stateful services use data caches, server queues, and typically have session manager overhead. These items impact service CPU and memory utilization levels independent of the consumer request load.


For a scalability and performance test running in a defined software and hardware environment, a calibration test helps determine the appropriate service concurrent requests levels and message payload sizes. The results show a Scalability Index for the service.

Scalability Index

A Scalability Index is a function of service performance (in TPS) as concurrent requests levels and payload sizes change in a test scenario as shown in Figure 5.

Figure 5 - Scalability Index

There are three distinct parts to this Scalability Index: View B shows we are neither underdriving (too few requests) nor overdriving (too many requests) the service within its CRs levels range. If we consider View B as our Calibration Test results, we now have a range of CRs levels to use in the Full Test of our test scenario. At this point, we could also run additional scenarios, that use alternative APIs and products, with the CRs levels used in View B and compare the TPS results.

In this example we found the CPU, network, and memory utilization values to be helpful in determining where performance and scalability bottlenecks occur. As a note of reference, CPU and memory bandwidth are helpful in stateless tests.

In the next section we will show that CPU and memory bandwidth are usually meaningless for stateful tests.


Understanding TPS

Before moving on to running the full test and doing the results analysis, we're going to discuss Transactions Per Second (TPS) in detail so you'll have a thorough understanding of TPS measurements. We have seen TPS results confuse and mislead software architects, developers, and CEOs. Sometimes the results can be counterintuitive.

TestMaker shows a system's scalability in a Scalability Index chart shown in Figure 6.

Figure 6 - Test results of a system's throughput at 4 concurrent requests levels.
System throughput is measured as the number of transactions a system handles as more requests are received. A perfect information system handles requests at a constant rate regardless of the number of requests; it increases its transactions-per-second to maintain a constant response time. Charting a perfect system's scalability shows the TPS rate increases in equal proportion to the number of received requests; it is a linear relationship.

For instance, if a perfect system handles 100 concurrent requests in 10 seconds with a 2 second response time, then maintaining the 2 second response time, it should handle 200 concurrent requests in the same 10 second period. Figure 7 shows the scalability index of a perfect system - a system with linear scalability.


Figure 7 - Scalability Index for a perfect system.
At each concurrent requests level, the system handles them at a measured rate in transactions per second, yielding the system's response time. As concurrent requests levels increase, the system handles the requests at the same rate so the number of transactions increases in equal proportion. TPS keeps going up-and-to-the-right in equal proportion to the number of requests.

For instance, at 100 concurrent requests with a system handling 1000 requests in a 10 second period yields a handling rate of 100 transactions-per-second. The same system at 200 concurrent requests handles 2000 requests in the same 10 second period increasing the handling rate to 200 transactions-per-second. That is perfect scalability – the Holy Grail of performance testing. Receiving more requests does not slow down system response time.


A typical system hitting a bottleneck has the performance shown in Figure 8

Figure 8 - A service exhibiting a performance bottleneck.
As the system receives larger numbers of concurrent requests it slows down in responding to each request. If you kept increasing past 400 concurrent requests, the system would eventually reach zero transactions-per-second. Many systems we have checked for scalability have this problem. The Scalability Index helps systems managers plan system capacity to achieve the desired throughput needed to make users happy. And helps developers understand how their design and coding decisions impact performance.

In our experience, the more common situation is shown in Figure 9.

Figure 9 - A service exhibiting a scalability problem.

As shown in the first three columns, the system is able to handle increasing levels of concurrent requests. However, the fourth column shows the system hits an upper limit in handling transactions. This is often caused by a database indexing problem, a full data cache, or a saturated network connection.

This TPS method may seem straight forward enough to you. Now we will cover a sample of questions received from an engineer, an engineering manager, and an executive during a scalability and performance study. We conducted a scalability test and obtained the results listed in Table 8.

Table 8 - The Scalability index For A Given Service
Concurrent Requests Transactions Per Second (TPS) Completed Transactions Average Response Time (milliseconds)
25 0.38 68 65,344
15 0.33 61 33,828
10 0.31 59 12,234

From the results in Table 8 you may wonder why the TPS increases only slightly considering the test is making 2.5 (25/10) times more concurrent requests? Why didn't the the TPS value increase by 2.5 times to 0.775 (0.31 TPS at 10 concurrent requests times 2.5)?

Free-running threads generate concurrent requests with no sleep time between requests as illustrated in Figure 10. Their job is to keep making requests to the server during the test period. Yet, the average response time with 25 users is 5.34 times longer (65,344 milliseconds at 25 users divided by 12,234 milliseconds at 10 users). Consequently, there are fewer opportunities to log results and increase the TPS value.

Figure 10 - Throughput (TPS) decreases as service response time increases as measured from the consumer.
When a test increases the number of concurrent users, one of three things can happen:
  1. The server takes less time (on average) to respond than at lower CUs levels. In this condition each CU finishes sooner, logs a response (a transaction), and makes its next request much sooner. TPS increases from lower CUs levels.

  2. The server takes the same time (on average) to respond as at lower CUs levels. In this condition each CU takes the same amount of time but there are more CUs running concurrently. TPS goes up from the lower CUs levels proportionately to the CUs increase.

  3. The server takes more time (on average) to respond than at the lower CUs levels. In this condition each CU finishes later resulting in fewer opportunities for the server to handle more requests. TPS drops proportionately to the increased response time.

In the next section we present a list of test issues and solutions you can use based on your service's Scalability Index.

Calibration What-If Chart

It is beyond the scope of this book to explore everything that might happen during a Calibration test. However, Table 9 lists a few significant issues to be aware of.

Table 9 - Calibration What-If Chart
Test Experience Likely Problem What To Do Next
Increase CRs with a decrease in TPS Check average response time. Run a test case comparing response times as CRs increase. Identify the least acceptable response time and work back from there.
Increase CRs with an increase in TPS Test is not calibrated for high enough CRs and payload sizes. Run calibration test to determine optimal Scalability Index and set correct CRs and payload-size levels.
Increase CRs with little change in TPS CRs levels are set too high. Check CPU utilization if doing stateless test. Run another test with the CRs level reduced by 50%.
Server CPU at 95% utilization and increasing CRs levels increases TPS This is probably a stateful system test. Run a test case to determine the service-under-test Scalability Index.
Consumer CPU at 95% utilization and increasing CRs levels decreases TPS CRs levels are too high for the number of load generating consumers. Add more load generating consumers.
Consumer CPU utilization at 15%, Server CPU utilization at 30%, and increasing CRs levels barely changes TPS Network is probably saturated. Check network bandwidth utilization. Add network adaptors to server or consider a faster network.

Calibration Testing shows a server's possible Transactions Per Second (TPS) levels given its equipment, software, and network configuration. The next step in the PushToTest method is to run the real test at the calibrated test levels. The <TBD> section shows the results from the FastSOA Performance Kit and how to evaluate the results.



Additional documentation, product downloads and updates are at www.PushToTest.com. While the PushToTest TestMaker software is distributed under an open-source license, the documentation remains (c) 2007 PushToTest. All rights reserved. PushToTest is a trademark of the PushToTest company.