Sun Java System Web Server 7.0 Performance Tuning, Sizing, and Scaling Guide
只搜尋這本書
查看這本書:
以 PDF 格式下載這本書 (1306 KB)

Chapter 6 Scalability Studies

This chapter describes the results of scalability studies. You can refer to these studies for a sample of how the server performs, and how you might configure your system to best take advantage of Web Server’s strengths.

This chapter includes the following topics:

Study Goals

The goal of the tests in the study was to shows how well Sun Java System Web Server 7 scales. The tests also helped to determine the configuration and tuning requirements for different types of content.

The studies were conducted with the following content:

  • 100% static

  • 100% C CGI

  • 100% Perl CGI

  • 100% NSAPI

  • 100% Java servlets

  • 100% PHP/FastCGI

  • E-commerce web application with large inventory

Study Conclusion

When tuned, Sun Java System Web Server 7.0 scaled almost linearly in performance for dynamic and static content.

Hardware

The studies (except for the e-commerce study) were conducted using the following hardware. For hardware information for the e-commerce study, see Hardware for E-Commerce Test.

Web Server system configuration for static content:

  • Sun Microsystems Sun Fire T2000 (120 MHz, 8 cores) (only six cores were used for this test)

  • 16256 Megabytes of memory

  • Solaris 10 operating system

  • Three Sun StoreEdge 3510

Web Server system configuration:

  • Sun Microsystems Sun Fire T2000 (1000 MHz , 6 cores)

  • 16376 Megabytes of memory

  • Solaris 10 operating system

Driver system configuration:

  • Three Sun Microsystems Sun FireTM X4100

  • Four Sun Microsystems Sun Fire V490 ( 2 X 1050 MHzUS-IV)

  • Three Sun Fire T1000

  • Sun Fire 880 (990 MHz US-III+)

  • 8192 Megabytes of memory

  • Solaris 10 operating system

Network configuration:

The Web Server and the driver machines were connected with multiple gigabit Ethernet links

Software

The load driver for these tests was an internally-developed Java application framework called the Faban driver.

Configuration and Tuning

The following tuning settings are common to all the tests in this study. Individual studies may also have additional configuration and tuning information.

/etc/system tuning:

set rlim_fd_max=500000
set rlim_fd_cur=500000


set sq_max_size=0
set consistent_coloring=2
set autoup=60
set ip:ip_squeue_bind=0
set ip:ip_soft_rings_cnt=0
set ip:ip_squeue_fanout=1
set ip:ip_squeue_enter=3
set ip:ip_squeue_worker_wait=0

set segmap_percent=6
set bufhwm=32768
set maxphys=1048576
set maxpgio=128
set ufs:smallfile=6000000

*For ipge driver
set ipge:ipge_tx_ring_size=2048
set ipge:ipge_tx_syncq=1
set ipge:ipge_srv_fifo_depth=16000
set ipge:ipge_reclaim_pending=32
set ipge:ipge_bcopy_thresh=512
set ipge:ipge_dvma_thresh=1
set pcie:pcie_aer_ce_mask=0x1

*For e1000g driver
set pcie:pcie_aer_ce_mask = 0x1

TCP/IP tuning:

ndd -set /dev/tcp tcp_conn_req_max_q 102400
ndd -set /dev/tcp tcp_conn_req_max_q0 102400
ndd -set /dev/tcp tcp_max_buf 4194304
ndd -set /dev/tcp tcp_cwnd_max 2097152
ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000

Network Configuration

Since the tests used multiple network interfaces, it was important to make sure that all the network interfaces were not going to the same core. Network interrupts were enabled on one strand and disabled on the remaining three strand of a core using the following script:


allpsr=`/usr/sbin/psrinfo | grep -v off-line | awk '{ print $1 }'`
  set $allpsr
  numpsr=$#
  while [ $numpsr -gt 0 ];
  do
      shift
      numpsr=`expr $numpsr - 1`
      tmp=1
      while [ $tmp -ne 4 ];
      do
          /usr/sbin/psradm -i $1
          shift
          numpsr=`expr $numpsr - 1`
          tmp=`expr $tmp + 1`
      done
  done

The following example shows psrinfo output before running the script:


# psrinfo | more
0       on-line   since 12/06/2006 14:28:34
1       on-line   since 12/06/2006 14:28:35
2       on-line   since 12/06/2006 14:28:35
3       on-line   since 12/06/2006 14:28:35
4       on-line   since 12/06/2006 14:28:35
5       on-line   since 12/06/2006 14:28:35
.................

The following example shows psrinfo output after running the script:


0       on-line   since 12/06/2006 14:28:34
1       no-intr   since 12/07/2006 09:17:04
2       no-intr   since 12/07/2006 09:17:04
3       no-intr   since 12/07/2006 09:17:04
4       on-line   since 12/06/2006 14:28:35
5       no-intr   since 12/07/2006 09:17:04
          .................

Web Server Tuning

The following table shows the tuning settings used for the Web Server.

Table 6–1 Web Server Tuning Settings

Component

Default

Tuned

Access logging

enabled=true

enabled=false

Thread pool

min-threads=16

max-threads=128

stack-size=131072

queue-size=1024

min-threads=128

max-threads=200

stack-size=262144

queue-size=15000

HTTP listener

Non-secure listener on port 80

listen-queue-size=128

Non-secure listener on port 80

Secure listener on port 443

listen-queue-size=15000

Keep alive

enabled=true

threads=1

max-connections=200 timeout=30 sec

enabled=true

threads=2

max-connections=15000 timeout=180 sec

default-web.xml

JSP compilation turned on

JSP compilation turned off

The following table shows the SSL session cache tuning settings used for the SSL tests.

Table 6–2 SSL Session Cache Tuning Settings

Component

Default

SSL session cache

enabled=true

max-entries=10000

max-ssl2-session-age=100

max-ssl3-tls-session-age=86400

Performance Tests and Results

This section contains the test-specific configuration, tuning, and results for the following tests:

The following metrics were used to characterize performance:

  • Operations per second (ops/sec) = successful transactions per second

  • Response time for single transaction (round-trip time) in milliseconds

The performance and scalability diagrams show throughput (ops/sec) against the number of cores enabled on the system.

Static Content Test

This test was performed with a static download of a randomly selected file from a pool of 10,000 directories, each containing 36 files ranging in size from 1KB to 1000 KB. The goal of the static content test was to saturate the cores and find out the respective throughput and response time.

This test used the following configuration:

  • Static files were created on striped disk array (Sun StorEdge 3510).

  • Multiple network interfaces were configured.

  • Web Server was configured with 64 bit.

  • File-cache was enabled with the tuning settings described in the following table.

Table 6–3 File Cache Configuration

Default

Tuned

enabled=true

max-age=30 sec

max-entries=1024

sendfile=false

max-heap-file-size=524288

max-heap-space=10485760

max-mmap-file-size=0

max-mmap-space=0

enabled=true

max-age=3600

max-entries=1048576

sendfile=true

max-heap-file-size=1200000

max-heap-space=8000000000

max-mmap-file-size=1048576

max-mmap-space= l

max-open-files=1048576

The following table shows the static content scalability results.

Table 6–4 Static Content Scalability

Number Of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

10365

184

4

19729

199

6

27649

201

The following is a graphical representation of static content scalability results.

Static Content Scalability-Number of cores

Dynamic Content Test: Servlet

This test was conducted using the servlet. The test prints out the servlet's initialization arguments, environments, request headers, connection and client information, URL information, and remote user information. JVM tuning settings were applied to the server. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table shows the JVM tuning settings used in the test.

Table 6–5 JVM Tuning Settings

Default

Tuned

-Xmx128m

-Xms256m

-server -Xrs -Xmx2048m -Xms2048m -Xmn2024m -XX:+AggressiveHeap -XX:LargePageSizeInBytes=256m -XX:+UseParallelOldGC -XX:+UseParallelGC -XX:ParallelGCThreads=<number of cores> -XX:+DisableExplicitGC

The following table shows the results for the dynamic content servlet test.

Table 6–6 Dynamic Content Test: Servlet Scalability

Number Of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

5287

19

4

10492

19

6

15579

19

The following is a graphical representation of servlet scalability results.

Servlet Scalability- Number of cores

Dynamic Content Test: C CGI

This test was performed by accessing a C executable called printenv. This executable outputs the environment variable information. CGI tuning settings were applied to the server. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table describes the CGI tuning settings used in this test.

Table 6–7 CGI Tuning Settings

Default

Tuned

idle-timeout=300

cgistub-idle-timeout=30

min-cgistubs=0

max-cgistubs=16

idle-timeout=300

cgistub-idle-timeout=1000

min-cgistubs=100

max-cgistubs=100

The following table shows the results of the dynamic content test for C CGI.

Table 6–8 Dynamic Content Test: C CGI Scalability

Number Of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

892

112

4

1681

119

6

2320

129

The following is a graphical representation of C CGI scalability results.

C CGI Scalability- Number of cores

Dynamic Content Test: Perl CGI

This test was conducted with Perl script called printenv.pl that prints the CGI environment. CGI tuning settings were applied to the server. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table shows the CGI tuning settings used in the dynamic content test for Perl CGI.

Table 6–9 CGI Tuning Settings

Default

Tuned

idle-timeout=300

cgistub-idle-timeout=30

min-cgistubs=0

max-cgistubs=16

idle-timeout=300

cgistub-idle-timeout=1000

min-cgistubs=100

max-cgistubs=100

The following table shows the results for the dynamic content test of Perl CGI.

Table 6–10 Dynamic Content Test: Perl CGI Scalability

Number Of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

322

310

4

611

327

6

873

343

The following is a graphical representation of Perl CGI scalability results.

Perl CGI Scalability- Number of cores

Dynamic Content Test: NSAPI

The NSAPI module used in this test was printenv2.so. It prints the NSAPI environment variables along with some text to make the entire response 2 KB. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The only tuning for this test was optimizing the path checks in obj.conf by removing the unused path checks.

The following table shows the results of the dynamic content test for NSAPI.

Table 6–11 Dynamic Content Test: NSAPI Scalability

Number Of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

6264

14

4

12520

15

6

18417

16

The following is a graphical representation of NSAPI scalability results.

NSAPI Scalability- Number of cores

PHP Scalability Tests

PHP is a widely-used scripting language uniquely suited to creating dynamic, Web-based content. It is the most rapidly expanding scripting language in use on the Internet due to its simplicity, accessibility, wide number of available modules, and large number of easily available applications.

The scalability of Web Server combined with the versatility of the PHP engine provides a high-performing and versatile web deployment platform for dynamic content. These tests used PHP version 5.1.6.

The tests were performed in two modes:

  • An out-of-process fastcgi-php application invoked using the FastCGI plug-in available for Sun Java System Web Server 7.0 (the download will be available from http://www.zend.com/sun/).

  • In-process PHP NSAPI plug-in.

The test executed the phpinfo() query. The goal was to saturate the cores on the server and find out the respective throughput and response time.

PHP Scalability with Fast CGI

The following table shows the Web Server tuning settings used for the FastCGI plug-in test

Table 6–12 Tuning Settings for FastCGI Plug-in Test

Configuration

Tuning

magnus.conf

Init fn="load-modules" shlib="path_to_web_server_plugin_dir/fastcgi/libfastcgi.so" funcs="responder_fastcgi" shlib_flags="(global|now)"

obj.conf

NameTrans fn="assign-name" from="/fcgi/*" name="fcgi.config"
<Object name="fcgi.config">
Service type="magnus-internal/ws-php" fn="responder-fastcgi"
app-path="path_to_php"
bind-path="localhost:9000"
app-env="PHP_FCGI_CHILDREN=128"
app-env="PHP_FCGI_MAX_REQUESTS=20000"
app-env="LD_LIBRARY_PATH=path_to_php_lib"
listen-queue=8192
req-retry=2
reuse-connection=1
connection-timeout=120
resp-timeout=60
restart-interval=0
</Object>

mime.types

type=magnus-internal/ws-php exts=php,php3,php4

The following table shows the results of the PHP with FastCGI test.

Table 6–13 PHP Scalability with Fast CGI

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

876

114

4

1706

117

6

2475

121

The following is a graphical representation of PHP scalability with Fast CGI.

PHP Scalability with Fast CGI- Number of cores

PHP Scalability with NSAPI

The following table shows the Web Server tuning settings for the PHP with NSAPI test.

Table 6–14 NSAPI Plug-in Configuration for PHP

magnus.conf

Init fn="load-modules" shlib="libphp5.so" funcs="php5_init,php5_close,php5_execute"

Init fn="php5_init" errorString="PHP Totally Blew Up!"

obj.conf

NameTrans fn="pfx2dir" from="/php-nsapi" dir="path_to_php_script_dir" name="php-nsapi" <Object name="php-nsapi"> ObjectType fn="force-type" type="magnus-internal/x-httpd-php" Service fn=php5_execute </Object>

mime.types

type=magnus-internal/ws-php exts=php,php3,php4

The following table shows the results of the PHP with NSAPI test.

Table 6–15 PHP Scalability with NSAPI

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

950

105

4

1846

108

6

2600

115

The following is a graphical representation of PHP scalability with NSAPI.

PHP Scalability with NSAPI- Number of cores

SSL Performance Test: Static Content

This test was performed with static download of a randomly selected file from a pool of 10,000 directories, each containing 36 files ranging in size from 1KB to 1000 KB. The goal of the SSL static content tests was to saturate the cores and find out the respective throughput and response time. Only four cores of T2000 were used for this test.

This test used the following configuration:

  • Static files were created on striped disk array (Sun StorEdge 3510).

  • Multiple network interfaces were configured.

  • The file cache was enabled and tuned using the settings in Table 6–3.

  • The SSL session cache was tuned using the settings in Table 6–2.

  • Web Server is configured with 64 bit

The following table shows the SSL static content test results.

Table 6–16 SSL Performance Test: Static Content Scalability

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

2284

379

4

4538

387

6

6799

387

The following is a graphical representation of static content scalability with SSL.

Static Content Scalability with SSL- Number of cores

SSL Performance Test: Perl CGI

This test was conducted with Perl script called printenv.pl that prints the CGI environment in SSL mode. The test was performed in SSL mode with the SSL session cache enabled. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table shows the SSL Perl CGI test results.

Table 6–17 SSL Performance Test: Perl CGI Scalability

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

303

329

4

580

344

6

830

361

The following is a graphical representation of Perl scalability with SSL.

PHP CGI Scalability With SSL- Number of cores

SSL Performance Test: C CGI

This test was performed by accessing a C executable called printenv in SSL mode. This executable outputs the environment variable information. The test was performed in SSL mode with the SSL session cache enabled. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table shows the SSL CGI test results.

Table 6–18 SSL Performance Test: C CGI Scalability

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

792

126

4

1499

133

6

2127

141

The following is a graphical representation of C CGI scalability with SSL.

C CGI Scalability With SSL- Number of cores

SSL Performance Test: NSAPI

The NSAPI module used in this test was printenv2.so. It prints the NSAPI environment variables along with some text to make the entire response 2 KB. The test was performed in SSL mode with the SSL session cache enabled. The goal was to saturate the cores on the server and find out the respective throughput and response time.

The following table shows the SSL NSAPI test results.

Table 6–19 SSL Performance Test: NSAPI Scalability

Number of Cores

Average Throughput (ops/sec)

Average Response Time (ms)

2

2729

29

4

5508

30

6

7982

32

The following is a graphical representation of NSAPI scalability with SSL.

NSAPI Scalability with SSL-Number of cores

E-Commerce Web Application Test

The e-commerce application is a more complicated application that utilizes a database to simulate online shopping.

Hardware for E-Commerce Test

The e-commerce studies were conducted using the following hardware.

Web Server system configuration:

  • Sun Microsystems Sun Fire 880 ( 900MHz US-III+). Only four CPUs were used for this test.

  • 16384 Megabytes of memory.

  • Solaris 10 operating system.

Database system configuration:

  • Sun Microsystems Sun Fire 880 ( 900MHz US-III+)

  • 16384 Megabytes of memory

  • Solaris 10 operating system

  • Oracle 10.1.0.2.0

Driver system configuration:

  • Sun Microsystems Sun Fire 880 ( 900MHz US-III+)

  • Solaris 10 operating system

Network configuration:

The Web Server, database, and the driver machines were connected with a gigabit Ethernet link.

Configuration and Tuning for E-Commerce Test

The e-commerce test was run with the following tuning settings.

JDBC tuning:

<jdbc-resource>
    <jndi-name>jdbc/jwebapp</jndi-name>
    <datasource-class>oracle.jdbc.pool.OracleDataSource</datasource-class>
    <max-connections>200</max-connections>
    <idle-timeout>0</idle-timeout>
    <wait-timeout>5</wait-timeout>
    <connection-validation>auto-commit</connection-validation>
    <property>
      <name>username</name>
      <value>  db_user  </value>
    </property>
    <property>
      <name>password</name>
      <value> db_password    </value>
    </property>
    <property>
      <name>url</name>
      <value>jdbc:oracle:thin:@db_host_name:1521:oracle_sid</value>
    </property>
 <property>
      <name>ImplicitCachingEnabled</name>
      <value>true</value>
    </property>
    <property>
      <name>MaxStatements</name>
      <value>200</value>
    </property>
  </jdbc-resource

JVM tuning:

-server -Xmx1500m -Xms1500m -Xss128k -XX:+DisableExplicitGC

E-commerce Application Description

The test models an e-commerce web site that sells items from a large inventory. It uses the standard web application model-view-controller design pattern for its implementation: the user interface (that is, the view) is handled by 16 different JSP pages which interface with a single master control servlet. The servlet maintains JDBC connections to the database, which serves as the model and handles 27 different queries. The JSP pages make extensive use of JSP tag libraries and comprise almost 2000 lines of logic.

Database Cardinality

The database contains 1000 orderable items (which have two related tables which also have a cardinality of 1000), 72000 customers (with two related tables), and 1.9 million orders (with two related tables). Standard JDBC connections handle database connection using prepared statements and following standard JDBC design principles.

Workload

A randomly-selected user performs the online shopping. The following operations were used in the Matrix mix workload (operations were carried out with precedence of operations): Home, AdminConfirm, AdminRequest, BestSellers, BuyConfirm, BuyRequest, CustomerRegistration, NewProducts, OrderDisplay, OrderInquiry, ProductDetail, SearchRequest, SearchResults, and ShoppingCart.

The Faban driver was used to drive the load. Think time was chosen from a negative exponential distribution. The minimum think time was 7.5 seconds, the maximum was 75 seconds. The maximum number of concurrent users that the system can support was based on the following passing criteria.

Table 6–20 Performance Test Pass Criteria

Transaction

90th Percentile Response Time (Seconds)

HomeStart

3

AdminConfirm

20

AdminRequest

3

BestSellers

5

BuyConfirm

5

BuyRequest

3

CustomerRegistration

3

Home

3

NewProducts

5

OrderDisplay

3

OrderInquiry

3

ProductDetail

3

SearchRequest

3

SearchResults

10

ShoppingCart

3

The following table shows the e-commerce web application test results.

Table 6–21 E-Commerce Web Application Scalability

Number of CPUs

Users

Throughput (ops/sec)

2

7000

790

4

11200

1350

The following is a graphical representation of e-commerce web application scalability.

E-commerce Web Application Scalability- Number of CPU's
(Users)

The following is a graphical representation of e-commerce web application scalability.

E-commerce Web Application Scalability - Number of CPU's
(Throughput)