

Advanced Technical Skills (ATS) North America

## **CPU MF – 2012 Update and WSC Experiences**

### **SHARE - Session 10886**

March 14, 2012

John Burg

jpburg@us.ibm.com

IBM Washington Systems Center





## Trademarks

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.

| AlphaBlox*                                 | GDPS*                                      | RACF*                                         | Tivoli*                |
|--------------------------------------------|--------------------------------------------|-----------------------------------------------|------------------------|
| APPN*                                      | HiperSockets                               | Redbooks*                                     | Tivoli Storage Manager |
| CICS*                                      | HyperSwap                                  | Resource Link                                 | TotalStorage*          |
| CICS/VSE*                                  | IBM*                                       | RETAIN*                                       | VSE/ESA                |
| Cool Blue                                  | IBM eServer                                | REXX                                          | VTAM*                  |
| DB2*                                       | IBM logo*                                  | RMF                                           | WebSphere*             |
| DFSMS                                      | IMS                                        | S/390*                                        | zEnterprise            |
| DFSMShsm                                   | Language Environment*                      | Scalable Architecture for Financial Reporting | xSeries*               |
| DFSMSrmm                                   | Lotus*                                     | Sysplex Timer*                                | z9*                    |
| DirMaint                                   | Large System Performance Reference™ (LSPR™ | ) Systems Director Active Energy Manager      | z10                    |
| DRDA*                                      | Multiprise*                                | System/370                                    | z10 BC                 |
| DS6000                                     | MVS                                        | System p*                                     | z10 EC                 |
| DS8000                                     | OMEGAMON*                                  | System Storage                                | z/Architecture*        |
| ECKD                                       | Parallel Sysplex*                          | System x*                                     | z/OS*                  |
| ESCON*                                     | Performance Toolkit for VM                 | System z                                      | z/VM*                  |
| FICON*                                     | PowerPC*                                   | System z9*                                    | z/VSE                  |
| FlashCopy*                                 | PR/SM                                      | System z10                                    | zSeries*               |
| * Registered trademarks of IBM Corporation | Processor Resource/Systems Manager         |                                               |                        |

#### The following are trademarks or registered trademarks of other companies.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.

IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

\* All other products may be trademarks or registered trademarks of their respective companies.

#### Notes:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.



Advanced Technical Skills

# Techdocs provides the latest ATS technical collateral <a href="http://www.ibm.com/support/techdocs">www.ibm.com/support/techdocs</a>





## Topics

#### CPU MF Introduction

- What is it and Why would you want to use?

#### Workload Characterization Update

• RNI based LSPR Workload match completed for z/OS

#### Key Performance Metrics for z10s and z196s

- CPI, Problem State, Cache / Memory Hierarchy
- New metrics and formulas

#### 2012 / 2011 WSC Customer Experiences with SMF 113s

- New Uses
  - z196 Customer Initiated Power Savings Mode Detection
  - Utilization Effect in Capacity Planning
  - Crypto Function Positioning and Usage
- What's New
  - COUNTER Data Loss APAR OA36816
  - z/VM Support APAR VM64961
- Summary
- Back Up
  - CPU MF Enablement
  - Old WSC Experiences



## **CPU** Measurement Facility Introduction

## What is the CPU Measurement Facility

#### New hardware instrumentation facility "CPU Measurement Facility" (CPU MF)

- Available on System z10 GA2 (EC and BC), z196, and z114
- Supported by a new z/OS component, Hardware Instrumentation Services (HIS)

#### Potential Uses – for this new "cool" virtualization technology

- COUNTERS
  - Supplement Current Performance Metrics why performance may have changed
  - Workload characterization
     Arrow New LSPR Workloads
- SAMPLING
  - ISV product improvement
  - Application Tuning



### What is the CPU Measurement Facility - Publications

- IBM Research article
  - "IBM System z10 performance improvements with software & hardware synergy"
  - <u>http://www.research.ibm.com/journal/rd/531/jackson.pdf</u>
  - Contact IBM team for copy of the article
- Feb 2011 Hot Topics A z/OS Newsletter GA22-7501
  - "A whole lot of benefits from HIS data" article page 24
    - COUNTERS and an update on SAMPLING HIS report tool and STG Lab Services
- Redpaper Setting Up and Using System z CPU Measurement Facility with z/OS
  - http://www.redbooks.ibm.com/redpieces/pdfs/redp4727.pdf

#### February 2012:

- mainframeZone.com zJournal: IT Management article
  - "How to Benefit From Hardware Instrumentation Services Data"
  - <u>http://www.mainframezone.com/it-management/how-to-benefit-from-hardware-instrumentation-services-data/P1</u>

## CPU MF COUNTERS – What and Why

#### • What is CPU MF?

- A new z10 and later facility that provides cache and memory hierarchy COUNTERS
- Also capable of time-in-Csect type SAMPLES
- Data gathering controlled through z/OS HIS (HW Instrumentation Services)
  - Collected on an LPAR basis
  - Written to SMF 113 records
  - Minimal overhead

#### How can the COUNTERS be used today?

- To supplement current performance data from SMF, RMF, DB2, CICS, etc.
- To help understand why performance may have changed

#### How can the COUNTERS be used for future processor planning? z/OS LSPR Workload

- They provide the basis for the new LSPR workload categories
- zPCR automatically processes CPU MF data to provide a workload match based on RNI

#### Recommend Capturing CPU MF SMF 113 Records on z10s, z196s, and z114s

Match Complete



## Workload Characterization Update



## New LSPR Workload Categories

- Historically, LSPR workload capacity curves (primitives and mixes) have had application names or been identified by a "software" captured characteristic
  - For example, CICS, IMS, OLTP-T, CB-L, LoIO-mix, TI-mix, etc
- However, capacity performance is more closely associated with how a\_workload is using and interacting with a processor "hardware" design
- With the availability of CPU MF (SMF 113) data on z10, the ability to gain insight\_into the\_interaction of workload and hardware has arrived
- The knowledge gained is still evolving, but the first step in the process is to produce LSPR workload capacity curves based on the underlying hardware sensitivities
- Thus, the LSPR for z196 will introduce three new workload categories which replace all prior primitives and mixes
  - Based on new hardware defined metric called Relative Nest Intensity
  - Low, Average, High (Relative Nest Intensity)
- To simplify the transition, an easy and automatic translation of old names to new categories will be supplied in zPCR
  - E.G if you have been using LoIO-mix in your studies, you'll simply use the new "Average"

### Fundamental Components of Workload Capacity Performance

### Instruction Complexity (Micro processor design)

- Many design alternatives
  - Cycle time (GHz), instruction architecture, pipeline, superscalar, Out-Of-Order, branch prediction and more
- Workload effect
  - May be different with each processor design
  - But once established for a workload on a processor, doesn't change very much

### Memory Hierarchy or "Nest"

- Many design alternatives
  - Cache (levels, size, private, shared, latency, MESI protocol), controller, data buses
- Workload effect
  - Quite variable
  - Sensitive to many factors: locality of reference, dispatch rate, IO rate, competition with other applications and/or LPARs, and more
    - Net effect of these factors represented in "Relative Nest Intensity"
- Relative Nest Intensity (RNI)
  - Activity beyond private-on-chip cache(s) is the most sensitive area
  - Reflects distribution and latency of sourcing from shared caches and memory
  - Level 1 cache miss per 100 instructions (L1MP) also important
  - Data for calculation available from CPU MF (SMF 113) starting with z10



## **z10** Memory / Cache Hierarchy





### CPU MF z10 Customer Workload Characterization Summary



2) Created new <u>RNI</u> metric



#### zPCR Workload Characterization for z/OS

"Scope of Work" Definition Change

New z/OS Workload Categories Defined





#### **RNI-based LSPR Workload Decision Table**

| L1MP     | RNI                         | LSPR Workload Match    |
|----------|-----------------------------|------------------------|
| <3%      | >= 0.75<br>< 0.75           | AVERAGE<br>LOW         |
| 3% to 6% | >1.0<br>0.6 to 1.0<br>< 0.6 | HIGH<br>AVERAGE<br>LOW |
| >6%      | >=0.75<br>< 0.75            | HIGH<br>AVERAGE        |

Notes: applies to z10, z196 and z114 CPU MF data table may change based on feedback

### Workload Characterization - Complete

- Future vision to help identify workload characteristics and to provide better input for capacity planning and performance
  - Step 1 Created Workload Categories from SMF 113s completed
    - Over 150 z10 Customer/Partitions participated
    - Measured LSPR with these new Categories
  - Step 2 Refine Workload Selection Process completed
    - As you move to z196 from z10, looking for "Before" and "After" volunteers
    - Received over 90 z196 Customer/Partitions thru Mar 1<sup>st</sup>.

#### - Step 3 - Completed

- Validated RNI based LSPR workload match
- CPU MF no longer a "hint" but the preferred method for LSPR workload match
  - For z10s, z196s and z114s
- IBM tools updated to reflect CPU MF preference (zPCR, CP3k)
- Thank you to the "Volunteers" who provided SMF 113s!



## Key Performance Metrics for z10s and z196s

### z196 versus z10 hardware comparison

- z10 EC
  - ► CPU
    - -4.4 GHz
  - Caches
    - -L1 private 64k i, 128k d
    - -L1.5 private 3 MB
    - L2 shared 48 MB / book
    - -book interconnect: star



- ► CPU
  - -5.2 GHz
  - Out-Of-Order execution
- ► Caches
  - -L1 private 64k i, 128k d
  - L2 private 1.5 MB
  - L3 shared 24 MB / chip
  - L4 shared 192 MB / book
  - -book interconnect: star





## z10 CPU MF and HIS provide a z/OS logical view of resource usage and cache / memory hierarchy sourcing



LPAR / Logical CP view:

Memory Accesses

Cache

- •L 2 / (L4 z196) Accesses (local and remote)
- •L3 Accesses on z196
- •L1.5 / (L2 z196) Accesses
- •L1 Sourced from Hierarchy

Instructions and Cycles

Crypto function

## Current CPU MF Key Performance Metrics:

| CPI | PRBSTATE | L1MP | L15P | L2LP | L2 RP | MEMP | LPARCPU |
|-----|----------|------|------|------|-------|------|---------|

- **CPI Cycles per Instruction**
- **PRBSTATE % Problem State**
- L1MP Level 1 Miss Per 100 instructions
- L15P % sourced from L1.5 cache
- L2LP % sourced from Level 2 Local cache (on same book)
- L2RP % sourced from Level 2 Remote cache (on different book)
- **MEMP % sourced from Memory**
- LPARCPU APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

Workload Characterization L1 Sourcing from cache/memory hierarchy



### Introducing the new Relative Nest Intensity (RNI) metric

- Relative Nest Intensity
  - Reflects the distribution and latency of sourcing from shared caches and memory
  - For z10 Technology the Relative Nest Intensity = (L2LP \* 1 + L2RP \* 2.4 + MEMP \* 7.5) / 100

## **Relative Nest Intensity**



Microprocessor Design

Memory Hierarchy or Nest

Note these Formulas may change in the future



#### New Estimated metrics: Instruction Complexity CPI, Finite CPI, and SCPL1M





Advanced Technical Skills

LPAR

z196 CPU MF and HIS provide a z/OS logical view of resource usage and cache / memory cache hierarchy sourcing



23



### Updated z10 CPU MF Workload Characterization Summary

|                                  |                                                             | I              | Ļ                                                                                                | Ļ                                                                          |                       |                   |                     |         |                    |                   |                   |                                |          |                     |               |
|----------------------------------|-------------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|-----------------------|-------------------|---------------------|---------|--------------------|-------------------|-------------------|--------------------------------|----------|---------------------|---------------|
| Customer                         | SYSID MON                                                   |                | Est Inst<br>BSTATE Cmplx                                                                         | Est Finite                                                                 | Est<br>SCPL1M         | L1MP              | L15P                | Ľ       | 2LP L              | _2RP              | MEMP              | Rel Nest<br>Inten <u>sit</u> y | LPAR     | Ef<br>CPU G         |               |
| All Volunteers                   | Minimu                                                      | m 3.1          | 1.1 2.<br><b>31.2 3</b> .                                                                        |                                                                            | 59.6<br>1 <b>01.4</b> | 1.3<br><b>3.9</b> | 48.6<br><b>68.9</b> |         | 5.6<br><b>21.2</b> | 0.0<br><b>1.6</b> | 2.2<br><b>8.3</b> | 0.4<br>0.9                     |          | 14.4<br><b>76.3</b> | $\overline{}$ |
| All Volunteers<br>All Volunteers | Average<br>Maximu                                           | T              | 67.1 5.                                                                                          |                                                                            | 194.9                 | 6.9               | 82.8                |         | 32.9               | 6.9               | 20.2              | 1.8                            |          | 1                   | 4.40          |
| New 7                            | 10 columns                                                  | s are          | CF                                                                                               | I – Cycle                                                                  | s per In              | struct            | ion                 |         |                    |                   |                   |                                |          |                     |               |
|                                  |                                                             |                | Prl                                                                                              | State - 9                                                                  | % Proble              | em St             | ate                 |         |                    |                   |                   |                                |          |                     |               |
| 1. I                             | 1. Est Instr Cmplx CPI                                      |                |                                                                                                  |                                                                            | plx CPI               | – Es              | timate              | ed Inst | truct              | ion C             | Comple            | exity CF                       | PI (infi | nite L              | .1)           |
| 2. I                             | Est Finite C                                                | PI             | Es                                                                                               | t Finite Cl                                                                | PI – Est              | imate             | d CP                | l from  | Fini               | te ca             | che/m             | emory                          |          |                     |               |
| 3. I                             | Est SCPL1N                                                  | Л              | Es                                                                                               | t SCPL1N                                                                   | ∕I – Estir            | nated             | l Sou               | rcing ( | Cycl               | es pe             | er Lev            | el 1 Mis                       | s        |                     |               |
| 4. I                             | Rel Nest Intensity L1MP – Level 1 Miss Per 100 instructions |                |                                                                                                  |                                                                            |                       |                   |                     |         |                    |                   |                   |                                |          |                     |               |
| 5. I                             | Eff GHz                                                     |                | L1                                                                                               | 5P – % so                                                                  | ourced f              | rom L             | evel                | 1.5 ca  | ache               |                   |                   |                                |          |                     |               |
|                                  |                                                             |                | L2                                                                                               | _P – % so                                                                  | ourced f              | rom L             | evel                | 2 Loca  | al ca              | che               | (on sa            | me boo                         | ok)      |                     |               |
|                                  |                                                             |                | L2                                                                                               | RP – % s                                                                   | ourced                | rom l             | _evel               | 2 Ren   | note               | cach              | ne (on            | differe                        | nt boo   | k)                  |               |
|                                  |                                                             |                | ME                                                                                               | MEMP - % sourced from Memory                                               |                       |                   |                     |         |                    |                   |                   |                                |          |                     |               |
|                                  | Workload Ch                                                 |                | Rel Nest Intensity – Reflects distribution and latency of sourcing from shared caches and memory |                                                                            |                       |                   |                     |         |                    |                   |                   |                                |          |                     |               |
|                                  | L1 Sourcing                                                 | from cache/men | nory hierarchy<br>LP                                                                             | <sup>by</sup> LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured |                       |                   |                     |         |                    |                   |                   |                                |          |                     |               |
|                                  |                                                             |                | Eff                                                                                              | GHz – E                                                                    | ffective              | gigah             | ertz f              | or GC   | Ps,                | cycle             | s per             | nanose                         | cond     |                     |               |

#### WSC z196 Sample CPU MF from July – 5 Minute Synched Intervals

| <b>z196</b> |     |        |       | ſ    |           |      |            |        |      | Ν    | lew  | z1   | 96 I | <i>Metr</i> | ics       |         |            |
|-------------|-----|--------|-------|------|-----------|------|------------|--------|------|------|------|------|------|-------------|-----------|---------|------------|
|             |     |        |       |      | (         |      | Est Finite | Est    |      |      |      |      |      |             | Rel Nest  |         | $\bigcirc$ |
| SYSID       | Mon | Day SH | Hour  | CPI  | Prb State | CPI  | CPI        | SCPL1M | L1MP | L2F  | L3P  | L4LP | L4RP | MEMP        | Intensity | LPARCPU | €ff GHz∕   |
| SYSD        | JUL | 22 N   | 17.25 | 3.65 | 2.3       | 2.70 | 0.95       | 26     | 3.7  | 77.8 | 20.5 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.33 | 3.68 | 2.3       | 2.73 | 0.95       | 26     | 3.6  | 77.4 | 20.8 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.42 | 3.67 | 2.3       | 2.72 | 0.95       | 26     | 3.7  | 78.0 | 20.3 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.50 | 3.64 | 2.3       | 2.71 | 0.93       | 26     | 3.6  | 77.8 | 20.5 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.58 | 3.66 | 2.3       | 2.72 | 0.94       | 26     | 3.6  | 77.9 | 20.4 | 0.8  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.67 | 3.65 | 2.3       | 2.72 | 0.94       | 26     | 3.6  | 77.0 | 21.1 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.75 | 3.66 | 2.3       | 2.72 | 0.94       | 26     | 3.6  | 77.4 | 20.8 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.83 | 3.64 | 2.3       | 2.70 | 0.94       | 26     | 3.6  | 77.1 | 21.0 | 0.9  | 0.2  | 0.7         | 0.24      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 17.92 | 2.78 | 49.2      | 2.06 | 0.72       | 34     | 2.1  | 76.7 | 18.3 | 1.8  | 1.4  | 1.9         | 0.42      | 1.5     | 5.2        |
| SYSD        | JUL | 22 N   | 18.00 | 3.65 | 3.2       | 2.71 | 0.94       | 26     | 3.6  | 77.0 | 21.1 | 1.0  | 0.2  | 0.7         | 0.25      | 0.8     | 5.2        |
| SYSD        | JUL | 22 N   | 18.08 | 5.00 | 0.8       | 3.46 | 1.53       | 27     | 5.7  | 86.1 | 11.9 | 0.3  | 0.1  | 1.7         | 0.28      | 9.7     | 5.2        |
| SYSD        | JUL | 22 N   | 18.17 | 3.72 | 3.2       | 2.76 | 0.96       | 27     | 3.6  | 76.8 | 21.0 | 1.1  | 0.2  | 8.0         | 0.26      | 0.9     | 5.2        |
| SYSD        | JUL | 22 N   | 18.25 | 3.82 | 3.7       | 2.76 | 1.06       | 28     | 3.7  | 77.4 | 19.8 | 1.2  | 0.6  | 1.1         | 0.30      | 0.9     | 5.2        |

**CPI – Cycles per Instruction** 

Prb State - % Problem State

Est Instr Cmplx CPI – Estimated Instruction Complexity CPI (infinite L1)

Est Finite CPI – Estimated CPI from Finite cache/memory

Est SCPL1M – Estimated Sourcing Cycles per Level 1 Miss

L1MP – Level 1 Miss Per 100 instructions

L2P – % sourced from Level 2 cache

- L3P % sourced from Level 3 on same Chip cache
- L4LP % sourced from Level 4 Local cache (on same book)
- L4RP % sourced from Level 4 Remote cache (on different book)

MEMP - % sourced from Memory

CPU MF provides measurement of the z196 Level 3 shared cache

These numbers come from a synthetic Benchmark and do not represent a production workload

Rel Nest Intensity - Reflects distribution and latency of sourcing from shared caches and memory

LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

Eff GHz – Effective gigahertz for GCPs, cycles per nanosecond

Workload Characterization L1 Sourcing from cache/memory hierarchy

## Formulas – z10

Workload Characterization L1 Sourcing from cache/memory hierarchy

| Metric   | Calculation – note all fields are deltas between intervals                            |
|----------|---------------------------------------------------------------------------------------|
| CPI      | B0 / B1                                                                               |
| PRBSTATE | (P33 / B1) * 100                                                                      |
| L1MP     | ((B2+B4) / B1) * 100                                                                  |
| L15P     | ((E128+E129) / (B2+B4)) * 100                                                         |
| L2LP     | ((E130+E131) / (B2+B4)) * 100                                                         |
| L2RP     | ((E132+E133) / (B2+B4)) * 100                                                         |
| MEMP     | (((E134+E135) + (B2+B4-E128-E129-E130-E131-E132-<br>E133-E134-E135)) / (B2+B4)) * 100 |
| LPARCPU  | ( ((1/CPSP/1,000,000) * B0) / Interval in Seconds) * 100                              |

CPI – Cycles per Instruction

PRBSTATE - % Problem State

L1MP - Level 1 Miss Per 100 instructions

L15P – % sourced from L1.5 cache

- L2LP % sourced from Level 2 Local cache (on same book)
- L2RP % sourced from Level 2 Remote cache (on different book)

MEMP - % sourced from Memory

LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

- B\* Basic Counter Set Counter Number
- P\* Problem-State Counter Set Counter Number

See "The Set-Program-Parameter and CPU-Measurement Facilities" SA23-2260-0 for full description

E\* - Extended Counters - Counter Number

See "IBM The CPU-Measurement Facility Extended Counters Definition for z10" SA23-2261-0/1 for full description

## Formulas – z10 Additional

| Metric              | Calculation – note all fields are deltas between intervals |
|---------------------|------------------------------------------------------------|
| Est Instr Cmplx CPI | CPI – Estimated Finite CPI                                 |
| Est Finite CPI      | ((B3+B5) / B1) * .84                                       |
| Est SCPL1M          | ((B3+B5) / (B2+B4)) * .84                                  |
| Rel Nest Intensity  | (1.0*L2LP + 2.4*L2RP + 7.5*MEMP) / 100                     |
| Eff GHz             | CPSP / 1000                                                |

#### Note these Formulas may change in the future

Est Instr Cmplx CPI – Estimated Instruction Complexity CPI (infinite L1)

Est Finite CPI – Estimated CPI from Finite cache/memory

Est SCPL1M – Estimated Sourcing Cycles per Level 1 Miss

Rel Nest Intensity – Reflects distribution and latency of sourcing from shared caches and memory

Eff GHz - Effective gigahertz for GCPs, cycles per nanosecond

Workload Characterization L1 Sourcing from cache/memory hierarchy

- B\* Basic Counter Set Counter Number
- P\* Problem-State Counter Set Counter Number

See "The Set-Program-Parameter and CPU-Measurement Facilities" SA23-2260-0 for full description

## Formulas – z196

Workload Characterization L1 Sourcing from cache/memory hierarchy

| Metric   | Calculation – note all fields are deltas between intervals                                                          |  |  |  |  |  |
|----------|---------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| CPI      | B0 / B1                                                                                                             |  |  |  |  |  |
| PRBSTATE | (P33 / B1) * 100                                                                                                    |  |  |  |  |  |
| L1MP     | ((B2+B4) / B1) * 100                                                                                                |  |  |  |  |  |
| L2P      | ((E128+E129) / (B2+B4)) * 100                                                                                       |  |  |  |  |  |
| L3P      | ((E150+E153) / (B2+B4)) * 100                                                                                       |  |  |  |  |  |
| L4LP     | ((E135+E136+E152+E155) / (B2+B4)) * 100 L3 off chip in local L4                                                     |  |  |  |  |  |
| L4RP     | ((E138+E139+E134+E143) / (B2+B4)) * 100 L3 off book in remote L4                                                    |  |  |  |  |  |
| MEMP     | (((E141+E142) + (B2+B4-E128-E129-E150-E153-E135-E136-E152-<br>E155-E138-E139-E134-E143-E141-E142)) / (B2+B4)) * 100 |  |  |  |  |  |
| LPARCPU  | ( ((1/CPSP/1,000,000) * B0) / Interval in Seconds) * 100                                                            |  |  |  |  |  |

**CPI – Cycles per Instruction** 

Prb State - % Problem State

L1MP – Level 1 Miss Per 100 instructions

L2P – % sourced from Level 2 cache

L3P – % sourced from Level 3 on same Chip cache

L4LP – % sourced from Level 4 Local cache (on same book)

L4RP – % sourced from Level 4 Remote cache (on different book)

MEMP - % sourced from Memory

LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

- B\* Basic Counter Set Counter Number
- P\* Problem-State Counter Set Counter Number

See "The Set-Program-Parameter and CPU-Measurement Facilities" SA23-2260-0 for full description

E\* - Extended Counters - Counter Number

See "The CPU-Measurement Facility Extended Counters Definition for z10 and z196" SA23-2261-01 for full description

## Formulas – z196 Additional

| Metric              | Calculation – note all fields are deltas between intervals |
|---------------------|------------------------------------------------------------|
| Est Instr Cmplx CPI | CPI – Estimated Finite CPI                                 |
| Est Finite CPI      | ((B3+B5) / B1) * (.57 + (0.1*RNI)) updated *               |
| Est SCPL1M          | ((B3+B5) / (B2+B4)) * (.57 + (0.1*RNI) ) updated *         |
| Rel Nest Intensity  | 1.6*(0.4*L3P + 1.0*L4LP + 2.4*L4RP + 7.5*MEMP) / 100       |
| Eff GHz             | CPSP / 1000                                                |

#### Note these Formulas may change in the future

#### \* Updated March 2011

- Est Instr Cmplx CPI Estimated Instruction Complexity CPI (infinite L1)
- Est Finite CPI Estimated CPI from Finite cache/memory
- Est SCPL1M Estimated Sourcing Cycles per Level 1 Miss
- Rel Nest Intensity –Reflects distribution and latency of sourcing from shared caches and memory
- Eff GHz Effective gigahertz for GCPs, cycles per nanosecond Workload Characterization L1 Sourcing from cache/memory hierarchy

- B\* Basic Counter Set Counter Number
- P\* Problem-State Counter Set Counter Number

See "The Set-Program-Parameter and CPU-Measurement Facilities" SA23-2260-0 for full description



## 2012 / 2011 WSC Experiences

New Uses What's New?

## CPU MF – WSC Experiences since March 2011

- z/OS Workload Characterization complete RNI based LSPR Workload match March 2012
- Customers continue to successfully running CPU MF COUNTERS in Production Today
- z196
  - HD=YES is even more important on z196, ensure HD=YES, 0-11% for 1 Book z196
    - See "Planning Considerations for HiperDispatch Mode Version 2" WP101229 <u>http://www.ibm.com/support/techdocs</u>
  - CPU MF provides measurement of z196 L3 cache
  - L3 off chip and off book sourced from respective L4s see slide 27
    - Typically evenly distributed when HD=YES
- New Uses
  - Measurement of z196 Customer Initiated Power Savings Mode APAR OA29530
  - Measurement of "Utilization Effect"
  - Crypto Function Positioning and CPACF Usage
- What's New
  - Support for COUNTER Data Loss APAR OA36816
  - New z/VM CPU MF COUNTERS support APAR VM64961



## New Uses

## z196 Customer Initiated Power Savings Mode



## z196 - Power Save Mode - Customer Initiated

- Reduce the energy consumption of your system
- Can be done on a scheduled basis
- A zCPC can be placed in power saving mode only once per day
- In z/OS when a Power Save event occurs:
  - SMF interval is ended and new one started
  - MSU and SU/SEC values are changed
  - SMF records record change (30, 70, 72, 89, 113.2, new 90.34)
  - Requires CPU times to be normalized, service units would be correct
- Support Provided by APAR OA29530

### WSC Test for z196 Power Save Mode with APAR OA29530

17:20:19.07 HIS032I STATE CHANGE DETECTED, ACTION=SAVE.

17:20:19.33 IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE

17:20:19.33 IWM064I THE SYSTEM IS RUNNING WITH REDUCED CAPACITY BECAUSE OF A MANUAL CONTROL SETTING

17:41:34.18 HIS032I STATE CHANGE DETECTED, ACTION=SAVE.

17:41:34.46 IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE

17:41:34.46 IWM064I THE SYSTEM IS RUNNING AT NOMINAL CAPACITY.

### WSC z196 Power Save SMF 113 Test Results – Jan 9 2011

|                       | SMF 113 - S |              |      |
|-----------------------|-------------|--------------|------|
| Time Stamp            | CPU #       | CpuProcClass | CPSP |
| 11 JAN 09 17:20:00.00 | ) 0         | 1            | 5206 |
| 11 JAN 09 17:20:00.00 | ) 1         | 1            | 5206 |
| 11 JAN 09 17:20:00.00 | ) 2         | 6            | 5206 |
| 11 JAN 09 17:20:00.00 | ) 3         | 6            | 5206 |
| 11 JAN 09 17:20:19.07 | · · ·       | 1            | 5206 |
| 11 JAN 09 17:20:19.07 |             | 1            |      |
| 11 JAN 09 17:20:19.07 |             |              |      |
| 11 JAN 09 17:20:19.07 | 3           | 6            | 5206 |
| 11 JAN 09 17:20:19.17 |             | 1            |      |
| 11 JAN 09 17:20:19.17 |             | 1            |      |
| 11 JAN 09 17:20:19.17 |             |              | 1    |
| 11 JAN 09 17:20:19.17 | 7 3         | 6            | 4452 |
|                       |             |              |      |
| 11 JAN 09 17:25:00.00 | ) 0         | 1            | 4452 |
| 11 JAN 09 17:25:00.00 | ) 1         | 1            | 4452 |
| 11 JAN 09 17:25:00.00 |             |              | 4452 |
| 11 JAN 09 17:25:00.00 | ) 3         | 6            | 4452 |
| 11 JAN 09 17:41:34.18 | 3 0         | 1            | 4452 |
| 11 JAN 09 17:41:34.18 | 3 ) 1       | 1            | 4452 |
| 11 JAN 09 17:41:34.18 | 3 / 2       | 6            | 4452 |
| 11 JAN 09 17:41:34.18 | 3 3         | 6            | 4452 |
| 11 JAN 09 17:41:34.35 | 5 0         | 1            | 5206 |
| 11 JAN 09 17:41:34.35 |             | 1            |      |
| 11 JAN 09 17:41:34.35 |             | -            |      |
| 11 JAN 09 17:41:34.35 |             |              |      |
|                       |             |              |      |



#### RMF CPU Activity Report - Nominal (NONE) and POWERSAVE

C P U A C T I V I T Y



C P U A C T I V I T Y



#### **Comparisons of MSUs and Effective GHz Ratio in Power Save Mode**

|                         |       | Effective |
|-------------------------|-------|-----------|
|                         | MSUs  | GHz       |
| Nominal (NONE)          | 6053  | 5.206     |
| POWERSAVE               | 5024  | 4.452     |
|                         |       |           |
| <b>POWERSAVE / NONE</b> | 83.0% | 85.5%     |

New Uses Utilization Effect



### CPU MF COUNTERS can be used to assess Utilization Effect

#### Common Capacity Planning knowledge

- Capacity Planning can be affected by machine utilization
- Growth in CPU time per transaction occurs as utilization grows
  - Transactions being processed on a system share the physical hardware resources (CPs, caches, memory buses) and software resources (z/OS, subsystems)
  - At 50% Utilization 2x fixed shared resources and ½ software work units to manage Vs 100% Utilization
  - See "Running IBM System z at High CPU Utilization" by Gary King WP101208 <u>http://www.ibm.com/support/techdocs</u>

New Insights and Guidelines

- Experience has shown a <u>3-5% change in CPU per 10% machine utilization</u>
- With CPU MF, can analyze partition CPI Vs Machine utilization (SMF 70s) to empirically build a unique Utilization Effect with a regression best fit formula
  - Or use following guidelines: % change in CPU per 10% machine utilization
    - RNI match "Low" 3%
    - RNI match "Avg" 4%
    - RNI match "High" 5% \_\_\_\_

Note these guidelines may change in the future



### Partition CPI Vs Total Machine GCP Busy Methodology

### Regression Best Fit Methodology

- SMF 113 data provides CPI (Cycles Per Instruction) for each Partition
  - Proxy for CPU / trans encompassing all work with assumption that within each LPAR
    - Consistent mix of transactions and jobs
    - Consistent instructions per transaction and job
- SMF 70 data provides overall machine utilization (total GCP utilization)
- Plot Partition CPI versus total machine GCP utilization for each interval, 8AM-5PM
  - Select representative business cycle days
- Regression fit to plotted points
- Compare growth in best-fit CPI from 80% to 90% box utilization



# Partition CPI Vs Total Machine GCP Busy

#### **Regression Best Fit**

#### **Y**=.05636**X**+6.6576

**CPI Vs Machine Busy Regression Fit** 





# **Regression Analysis and RNI Match Guideline**

### **Empirical Determined Regression Best Fit Analysis**

- Formula is unique to this customer analysis



e.g. so difference between 70% and 90% machine utilization is 10% more CPU

### **Results per 10% change in machine utilization**

- Regression Best Fit: Partition average growth in best-fit CPI is 5.0%
- RNI Guideline: Average of 5 days is 4.2%

|       |     |     | •  | (4+4  | +5   | +4+   | 4) / 5 =  | = 4.2  | 2      |      |        |     |        |        |      |           |         |         |        | $\frown$ |
|-------|-----|-----|----|-------|------|-------|-----------|--------|--------|------|--------|-----|--------|--------|------|-----------|---------|---------|--------|----------|
|       |     |     |    |       |      |       |           | Est    |        |      |        |     |        |        |      |           |         |         | ]      |          |
|       |     |     |    |       |      |       |           | Finite |        |      | L15P / |     | L2LP / | L2RP / |      | Rel Nest  |         |         | Machin | e Wkld   |
| SYSID | Mon | Day | SH | Hour  | CPI  | State | Cmplx CPI | CPI    | SCPL1M | L1MP | L2P    | L3P | L4LP   | L4RP   | MEMP | Intensity | LPARCPU | Eff GHz | Туре   |          |
| SYSA  | MAY | 12  | Р  | TOTAL | 8.35 | 0.2   | 3.35      | 5.00   | 108    | 4.6  | 69.7   | 0.0 | 18.0   | 3.8    | 8.5  | 0.91      | 135.1   | 4.4     | Z10    | AVG      |
| SYSA  | MAY | 13  | Р  | TOTAL | 8.24 | 0.3   | 3.36      | 4.88   | 104    | 4.7  | 70.6   | 0.0 | 17.5   | 3.5    | 8.5  | 0.89      | 104.7   | 4.4     | Z10    | AVG      |
| SYSA  | MAY | 16  | Р  | TOTAL | 6.93 | 0.7   | 3.11      | 3.82   | 117    | 3.3  | 71.0   | 0.0 | 15.0   | 2.7    | 11.3 | 1.06      | 54.1    | 4.4     | Z10    | HIGH     |
| SYSA  | MAY | 17  | Р  | TOTAL | 7.89 | 0.3   | 3.27      | 4.62   | 111    | 4.2  | 70.4   | 0.0 | 16.8   | 3.4    | 9.3  | 0.95      | 95.7    | 4.4     | Z10    | AVG /    |
| SYSA  | MAY | 18  | Ρ  | TOTAL | 7.79 | 0.2   | 3.24      | 4.55   | 107    | 4.3  | 70.0   | 0.0 | 17.8   | 3.8    | 8.4  | 0.90      | 147.2   | 4.4     | Z10    | AVG      |

# New Uses

# Crypto Function Positioning and Usage by CPACF

# Crypto Functions and CPU MF Measurements

#### There are 4 Crypto Functions:

- Data Confidentiality
  - Symmetric
  - Asymmetric
- Data Integrity (Modification Detection / Modification Authentication)
- Key Management
- Financial PIN Functions

### These Functions can be implemented in System z as follows:



# What's New? DATALOSS Parameter



### OA36816 - DATALOSS Parameter Expanded to COUNTERS

- F HIS "DATALOSS" parameter expanded to COUNTERS and Sampling for hardware data loss
  - Was already valid for SAMPLING buffer overflow
  - Without APAR, HIS and SMF 113 recording stops until HIS restarted and modified to collect COUNTERS again
    - HIS025I DATA COLLECTION IS PREMATURELY ENDING. MACHINE REPORTED COUNTER DATA LOSS
- With APAR can now specify IGNORE for any COUNTER Data Loss
  - Options are to IGNORE or STOP: DATALOSS= IGNORE|STOP
    - If STOP, HIS and recording of SMF 113s stops (until HIS started and modified to enable COUNTERS)
- Recommendation is use IGNORE (default) as SMF 113s will reflect condition
  - Can be abbreviated DL=I
  - New message HIS034I gets issued when DATALOSS=IGNORE and condition is detected
  - If COUNTER data loss occurs with IGNORE specified, HIS flags the SMF 113 records for that interval
    - SMF113\_2\_CF bit 4 set "hardware has lost counter data during the current interval"
    - If no COUNTER data loss in subsequent intervals, SMF 113\_2\_CF bit 4 is not set
  - Advantage is no operational support needed to continue collecting SMF 113s for COUNTER data loss
- Available since August 2011



# What's New? z/VM Supports CPU MF COUNTERS



### New z/VM capability supports CPU MF COUNTERS

### z/VM now supports CPU MF COUNTERS

- Provided by APAR VM64961
  - Available since August 2011
  - For z/VM 6.1 and z/VM 5.4 on z10s and z196s
- Also z/VM 6.2
- Future vision is to help identify z/VM workload characteristics and to provide better input for capacity planning and performance

# Sample z/VM CPU MF COUNTER Report

| SYSID         | Mon | Day SI | Hour     | Pool  | CPI   |       | Est Instr<br>Cmplx<br>CPI |      | Est<br>SCPL1M | L1MP | L15P<br>/ L2P |     | L2LP /<br>L4LP | L2RP /<br>L4RP | MEMP | Rel Nest<br>Intensity | LPARCPU | Eff GHz | Machine<br>Type | LSPR<br>Wkld<br>Hint |
|---------------|-----|--------|----------|-------|-------|-------|---------------------------|------|---------------|------|---------------|-----|----------------|----------------|------|-----------------------|---------|---------|-----------------|----------------------|
| Shift Summary |     | 104, 0 | - I loui | 1.001 |       | Olalo |                           | 0.1  |               |      |               | 201 |                | 2.1.3          |      | interiory             |         |         | 1,160           |                      |
| GDLBOFVM      | ост | 26 N   | τοτα     | Т     | 2.43  | 0.0   | 2.42                      | 0.01 | 112           | 0.0  | 76.0          | 0.0 | 21.9           | 0.1            | 2.1  | 0.37                  | 41.6    | 4.4     | 2097            | LOW                  |
| GDLRCT1       | OCT | 27 P   | TOTA     |       | 2.35  |       |                           |      |               | 0.0  | 95.1          | 2.3 | 1.4            |                | 0.1  | 0.09                  | 12.2    | 5.2     |                 | LOW                  |
|               |     |        |          |       |       |       |                           |      |               |      |               |     |                |                | •••• |                       |         |         |                 |                      |
| Hourly        |     |        |          |       |       |       |                           |      |               |      |               |     |                |                |      |                       |         |         |                 |                      |
| GDLBOFVM      | OCT | 26 N   | 19.50    | Т     | 4.53  | 0.0   | 3.01                      | 1.52 | 121           | 1.3  | 44.2          | 0.0 | 52.1           | 0.0            | 3.8  | 0.80                  | 0.1     | 4.4     | 2097            | AVG                  |
| GDLBOFVM      | OCT | 26 N   | 19.60    | Т     | 4.55  | 0.0   | 2.99                      | 1.56 | 124           | 1.3  | 44.5          | 0.0 | 51.4           | 0.0            | 4.1  | 0.82                  | 0.4     | 4.4     | 2097            | AVG                  |
| GDLBOFVM      | OCT | 26 N   | 19.70    | Т     | 2.52  | 0.0   | 2.47                      | 0.04 | 101           | 0.0  | 69.8          | 0.0 | 26.1           | 0.0            | 4.1  | 0.57                  | 16.8    | 4.4     | 2097            | LOW                  |
| GDLBOFVM      | OCT | 26 N   | 19.80    | Т     | 2.45  | 0.0   | 2.44                      |      | 124           | 0.0  | 78.4          | 0.0 | 19.7           | 0.1            | 1.9  |                       | 202.3   | 4.4     | 2097            | LOW                  |
| GDLBOFVM      | OCT | 26 N   | 19.90    |       | 2.43  |       |                           |      | 109           | 0.0  | 78.7          | 0.0 | 19.5           | 0.1            | 1.7  | 0.33                  | 226.8   | 4.4     |                 | LOW                  |
| GDLBOFVM      | OCT | 26 N   | 20.00    |       | 2.43  |       |                           |      | 106           | 0.0  | 79.0          | 0.0 | 19.4           | 0.1            | 1.5  |                       | 242.3   | 4.4     |                 | LOW                  |
| GDLBOFVM      | OCT | 26 N   | 20.10    |       | 2.40  | 0.0   |                           |      | 113           | 0.0  | 77.9          | 0.0 | 20.3           | 0.1            | 1.7  | 0.33                  | 143.0   | 4.4     |                 | LOW                  |
| GDLRCT1       | OCT | 27 P   | 12.80    |       | 2.36  |       | -                         |      | 26            | 0.1  | 95.6          | 2.1 | 1.1            | 1.1            | 0.1  | 0.08                  | 34.8    | 5.2     |                 | LOW                  |
| GDLRCT1       | OCT | 27 P   | 12.90    |       | 2.35  | 0.0   | 2.31                      | 0.04 | 28            | 0.1  | 94.8          | 2.4 | 1.5            | 1.2            | 0.1  | 0.09                  | 87.3    | 5.2     | 2817            | LOW                  |
| Pool Summary  |     |        |          |       |       |       |                           |      |               |      |               |     |                |                |      |                       |         |         |                 |                      |
| GDLBOFVM      | OCT | 26 N   | ΤΟΤΑ     |       | 2.43  |       |                           |      | 112           | 0.0  | 76.0          | 0.0 | 21.9           | 0.1            | 2.1  | 0.37                  | 41.6    | 4.4     | 2097            | LOW                  |
| GDLRCT1       | OCT | 27 P   | ΤΟΤΑ     |       | 2.29  | 0.0   | 2.27                      | 0.02 | 33            | 0.1  | 89.9          | 5.0 | 2.8            | 2.1            | 0.2  |                       | 11.8    | 5.2     | -               | LOW                  |
| GDLRCT1       | OCT | 27 P   | ΤΟΤΑ     |       | 12.38 |       | -                         |      | 46            | 14.4 | 88.2          | 1.0 | 3.8            |                | 0.2  |                       | 0.0     | 5.2     |                 | AVG                  |
| GDLRCT1       | OCT | 27 P   | TOTA     |       | 10.09 | 0.0   |                           |      |               | 10.5 | 97.9          | 0.9 | 0.6            |                | 0.0  | 0.04                  | 0.4     | 5.2     | -               | AVG                  |
| GDLRCT1       | OCT | 27 P   | TOTA     |       | 12.96 |       |                           | -    |               | 14.7 | 88.3          | 0.6 | 1.3            |                | 0.5  | 0.44                  | 0.0     | 5.2     | -               | AVG                  |
| GDLRCT1       | OCT | 27 P   | TOTA     | A 6   | 12.39 | 0.0   | 5.63                      | 6.76 | 47            | 14.5 | 87.9          | 0.8 | 4.7            | 6.1            | 0.5  | 0.37                  | 0.0     | 5.2     | 2817            | AVG                  |

These numbers come from a synthetic Benchmark and do not represent a production workload

### Workload Characterization Future Vision – z/VM Starting

- Future vision to help identify workload characteristics and to provide better input for capacity planning and performance
  - Step 1 in process Need samples from Customer <u>Production z/VM partitions</u> to develop Workload Categories - started in August 2011
    - Have received data from over 20 z/VM partitions as of March 2012 Thanks!
    - Can be z10 or z196 (z/VM 5.4, z/VM 6.1 or z/VM 6.2)
    - Hope to develop z/VM workload "match" like done for z/OS

**Looking for "Volunteers"** – (1 hour MONWRITE with D5 R13 enabled)

If interested send note to jpburg@us.ibm.com and rflewis@us.ibm.com for instructions. No deliverable will be returned

Benefit: Opportunity to ensure your data is used to influence analysis

# Summary

# z10 and z196 CPU MF COUNTERS Summary

- Traditional metrics continue to provide the best view of Performance
  - CPU MF can help explain <u>why</u> a change occurred
- Workload Characterization for z/OS Capacity Sizing complete
  - Relative Nest Intensity calculation now provides a LSPR workload match to zPCR
- CPU MF has a very low overhead to run and is easy to implement
  - Less than 1/100 of a second for HIS address space in 15 minute interval
  - Customers continue to successfully running CPU MF COUNTERS in Production Today
- Recommend enabling CPU MF COUNTERS on z10s, z196s and z114s today!
  - To supplement current performance metrics (e.g. from SMF, RMF, DB2, CICS), turn on and leave on
  - APAR OA30486 required for z196s and recommended for z10s
  - For long term gathering recommend CTR=(B,E) to get Extended Counters
- CPU MF Overview and WSC Experiences Techdoc TC000066
  - http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TC000066
  - CPU MF presentation and a detailed write up for enabling CPU MF
- z/VM Supports CPU MF
  - Volunteers are still needed for our z/VM Workload Characterization study





Complete!





# Thank You for attending!



### Acknowledgements

### Many people contributed to this presentation including:

| Riaz Ahmad        | <b>Richard Lewis</b> |
|-------------------|----------------------|
| Greg Boyd         | Gretchen Frye        |
| Jane Bartik       | Brian Wade           |
| Harv Emery        | Stephen Jones        |
| Gary King         | Romney White         |
| Frank Kyne        | Les Geer             |
| Marty Moroch      | Kevin Adams          |
| Steve Olenik      |                      |
| Bob Rogers        |                      |
| Bill Schray       |                      |
| Brian Smith       |                      |
| Bob St John       |                      |
| Elpida Tzortzatos |                      |
| Kathy Walsh       |                      |

# Back Up



# Crypto Exploiters and CPU MF Measurements

### Crypto Exploiters by System z implementation include:

| Software Routines                                                                   | CPACF                                                                                                                                               | Crypto Express Card                                                                                                                                                                                   |
|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| System SSL Handshake Phase 1 (2nd Choice)<br>System SSL Record Phase 2 (2nd Choice) | System SSL Record Phase 2 (1st Choice)                                                                                                              | System SSL Handshake Phase 1 (1st Choice)                                                                                                                                                             |
|                                                                                     | Data Encrytion Tool for IMS and DB2                                                                                                                 | Data Encrytion Tool for IMS and DB2                                                                                                                                                                   |
|                                                                                     | IBM Encryption Facility (CLRTDES or CLEAES<br>option to encrypt with clear key)<br>IBM Encryption Facility (use Passphrase to<br>generate data key) | IBM Encryption Facility (ENCTDES option to<br>encrypt with secure key)<br>IBM Encryption Facility (use RSA option to<br>protect the data key)<br>Tape/DASD Encryption (with ICSF as your<br>keystore) |
|                                                                                     | DB2 BIFs                                                                                                                                            |                                                                                                                                                                                                       |
|                                                                                     | `                                                                                                                                                   |                                                                                                                                                                                                       |

**CPU MF COUNTERS** can count of the type of functions

and CPU time for the Crypto Functions that execute on the CPACF



# Sample Report – Crypto COUNTERS provide measurement of CPACF Crypto Co-Processor Usage

| PRNG PRNG PRNG PRNG PRNG PRNG PRNG PRNG | Function<br>Cycle Co<br>Blocked<br>Unction<br>Vole Co<br>Locked Co | n Count<br>ount<br>Function<br>Count<br>unt<br>Function<br>Cycle Co<br>Count<br>unt<br>Function<br>Cycle Co<br>Cycle Co<br>Count | for a count | 11 0 | nters<br>PUs      |                   |     | 0.<br>592.           | ***<br>0/Sec<br>0/Sec<br>0/Sec<br>73/Sec<br>47/Sec<br>0/Sec<br>0/Sec<br>39/Sec |
|-----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-------------------|-------------------|-----|----------------------|--------------------------------------------------------------------------------|
| ***                                     |                                                                                                                                                                                                                                                                                    | CRYPT                                                                                                                            | O BUSY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | SUN  | MARY              | r                 |     |                      | ***                                                                            |
| AES                                     | Crypto<br>Crypto<br>Crypto                                                                                                                                                                                                                                                         | Busy:<br>Busy:                                                                                                                   | 0.00%<br>0.00%<br>2.55%<br>0.00%<br>2.55%                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | -    | for<br>for<br>for | the<br>the<br>the | 333 | CPUS<br>CPUS<br>CPUS |                                                                                |

#### This information may be useful in determining:

• A count of <u>How Many CPACF encryption</u> functions were executed

#### • How much CPU Time (cycles) were used

The encryption facility executed both SHA functions and TDES functions for this specific test.

Ran DASD dumps sequentially over 20 minute duration

With option: ENCRYPT(CLRTDES) -

These numbers come from a synthetic Benchmark and do not represent a production workload

•It is important to remember that <u>other Crypto functions may be executing in</u> <u>software and/or on Crypto Express Cards</u> (if installed & implemented). This is not measured by the CPU MF Crypto COUNTERS

•CPU MF Crypto COUNTERS can help assess how many of the Crypto Functions are occurring on the CPACF Co-Processors

See "A Synopsis of System z Crypto Hardware" by Greg Boyd **WP100810** <u>http://www.ibm.com/support/techdocs</u>



## CPU MF for z/VM Partitions

- The CPU Measurement Counter Facility provides for z/VM partitions:
  - Sets of counters for each logical processor that count events such as cycle, instruction, and cache directory-write counts
    - Same COUNTER information as z/OS partitions
  - Their accumulation is a relatively low-overhead activity and is performed automatically by the machine when the counters are authorized, enabled, and activated.
  - Authorization is controlled by a logical partition's Security settings in its activation profile. Enablement and activation are controlled by z/VM.

For long term gathering recommend Basic and Extended Counters, by setting only the Basic and Extended Counters Security settings

For z/VM Control Program, Problem State is 0 and CPACF is not utilized

- Enhancements in z/VM 6.1 and z/VM 5.4 provide the ability to collect CPU Measurement Facility counter data in the monitor stream.
- Support for the CPU-Measurement Sampling Facility and virtualization of the CPU-Measurement Facility interfaces for guest use are not provided.



### z/VM CPU MF operation support

- New support includes:
  - New command syntax for the MONITOR SAMPLE command, along with a new message.
  - A new command response for the QUERY MONITOR command.
  - One new monitor record.
    - Domain 5 (Processor), Record 13 (CPU-Measurement Facility Counters)
  - A modification to an existing monitor record.
    - Domain 1 (Monitor), Record 14 (Domain Detail)
  - A new message for <u>CP MONITOR SAMPLE ENABLE</u> <u>PROCESSOR</u> and <u>CP MONITOR SAMPLE ENABLE ALL</u>



# CPU MF COUNTERS Enablement

# CPU MF Requirements to utilize z10 and z196

- z196 or System z10 machine
  - z10 must be at GA2 Driver 76D Bundle #20 or higher

#### • z10 z/OS LPAR being measured must be at z/OS 1.8 or higher with APARs:

- OA25755, OA25750, and OA25773 also OA30486 for z/OS 1.10 and higher for new functionality
- OA27623 also recommended to add "CPU Speed" to SMF 113s and HIS COUNTERS output
- Not currently supported for z/OS running as a z/VM guest z/VM native prototype support in process

#### • z196 z/OS LPARs being measured at z/OS 1.9 or higher require APAR OA30486

• z/OS 1.8 requires OA33052

#### Recommended New SAMPLING APARS

- z/OS Mapping
  - z/OS 1.9 APAR OA32113
  - z/OS 1.10 APAR OA32113 and APAR OA34485
  - z/OS 1.11 APAR OA30429 and OA34485
  - z/OS 1.12 APAR OA34485
- CICS Mapping
  - APAR PM08568 (for CTS 3.2) or APAR PM08573 (for CTS 4.1).

# Steps to utilize z10 and z196 CPU MF

**Operationally CPU MF works the same on z196** 

### Steps to utilize CPU MF

- Configure the z10 or z196 to collect CPU MF Data
  - Update LPAR Security Tabs (See appendix)
- Configure HIS on z/OS to collect CPU MF Data
  - Set up HIS Proc
  - Set up OMVS Directory
  - Collect SMF 113s via SMFPRMxx
- Collect CPU MF Data
  - Start HIS Modify with Begin/End for COUNTERS or SAMPLING
  - "F HIS,B,TT='Text',PATH='/his/',CTRONLY,CTR=ALL" or
  - "F HIS,B,TT='Text',PATH='/his/',CTRONLY,CTR=(B,E)"
- Analyze the CPU MF Data SMF 113s

CPU MF has a very low overhead to run, is easy to implement, and is a very small SMF record

For long term gathering recommend CTR=(B,E) to get Extended Counters!

//HIS PROC //HIS EXEC PGM=HISINIT,REGION=0K,TIME=NOLIMIT



//SYSPRINT DD SYSOUT=\*

### OA30486 - New support for Sync Interval, PU Type and STATECHANGE

- APAR OA30486 with z/OS 1.12 GA will be rolled down to z/OS V1R11 and z/OS V1R10
  - Applicable for <u>z10s</u> and <u>z196s</u> for new functionality
  - A. New CPU MF capability to <u>SYNC the SMF 113 Interval</u> with other SMF records
    - SMFINTVAL=SYNC
      - Synchronize records with the SMF global recording interval
    - ..or <u>choose Interval time 1-60</u>
    - Recommendation is "SYNC":

Recommend SMFINTVAL=SYNC or SI=SYNC

- "F HIS,B,TT='Text',PATH='/his/',CTRONLY,CTR=(B,E),SMFINTVAL=SYNC "
- B. Identification of PU Type (GCP, zIIP or zAAP) in SMF 113 record
  - SMF113\_2\_CpuProcClass '0 '- GCP / '2' zAAP / '4' zIIP
- C. STATECHANGE (see next slide)
  - Both SMFINTERVAL and STATECHANGE <u>can be abbreviated</u>, e,g, SI=SYNC, SC=SAVE

     "F HIS,B,TT='Text',PATH='/his/',CTRONLY,CTR=(B,E),SI=SYNC,SC=SAVE "
- In SMF 113s the z196 processor is identified by

   SMF113\_2\_CTRVN2 = '2' for z196, '1' for z10

z196 Extended Counters have changed, use CTRVN2 to determine if z10 or z196



# HIS STATECHANGE

#### HIS detects and handles significant hardware events (state change)

- Replacement Capacity (Customer Initiated Upgrade)
- On/Off Capacity on demand
- z196 Power Save Mode

#### How HIS reacts depends on the STATECHANGE parameter specified

- STATECHANGE=STOP
  - Stop the collection run when the event was detected
- STATECHANGE=IGNORE
  - · Continue the collection run as if the event never happened
- STATECHANGE=SAVE (Default)
  - Record the previous state of the system (Save all data)
    - Write and close the .CNT file
    - Close all .SMP files (1 per CPU)
    - Cut SMF Type 113 Records (1 per CPU)
  - Continue the collection run with the new state
    - Create new .SMP files (1 per CPU)
    - Cut SMF Type 113 Records (1 per CPU)

#### STATECHANGE information not directly reported in the SMF 113

You will see additional record(s) and an increase/decrease in CPIDs or "CPU Speed"

Recommend STATECHANGE=SAVE, (the default) so don't need to specify

Verify with SMF 113s that "CPU Speed" or "Effective GHz" changed as expected



### New HIS APAR OA30486 support for z196 – WSC Example

15:33:13.24 JPBURG 00000200 F HIS,B,TT='Z196 w/ TEST',CTRONLY,CTR=ALL,SMFINTVAL=SYNC 15:33:14.22 STC01226 00000000 HIS0111 HIS DATA COLLECTION STARTED

| Time Stamp CPU #      |   | CpuProcClass (         | CTNVN1 | CTNVN2 CPSP |        |             |
|-----------------------|---|------------------------|--------|-------------|--------|-------------|
| 10 JUL 22 15:33:14.22 | 0 | 0                      | 1      | 2           | 5206 ← | - '5.2' GHz |
| 10 JUL/22 15:33:14.22 | 1 | 0                      | 1      | 2           | 5206   |             |
| 10 JUL 22 15:33:14.22 | 4 | 4                      | 1      | 2           | 5206   | -106        |
| 10 JUL 22 15:33:14.22 | 5 | 4                      | 1      | 2           | 5206   | z196        |
| 10 JUL 22 15:35:00.00 | 0 | 0                      | 1      | 2           | 5206   | ¥           |
| 10 JUL/22 15:35:00.00 | 1 | "0' GCP <sub>→ 0</sub> | 1      | 2           | 5206   |             |
| 10 JUL 22 15:35:00.00 | 4 | ' <b>4'-zIIP</b> → 4   | 1      | 2           | 5206   |             |
| 10 JUL 22 15:35:00.00 | 5 | 4                      | 1      | 2           | 5206   |             |
|                       |   |                        |        | Ţ           |        |             |
|                       |   |                        |        | '2' z196    | 5      |             |

SMF 113 Synched with SMF Global Recording Interval - 5 Minutes



### New HIS APAR OA30486 support for z196 – WSC Example

12:19:11.82 JPBURG 00000200 F HIS,B,TT='Z196 TEST', CTRONLY,CTR=ALL,SMFINTVAL=1

12:19:11.82 STC32434 00000000 HIS0111 HIS DATA COLLECTION STARTED



### SMF 113 Written every 1 Minute



### The Display HIS command (D HIS) provides useful status information

Information includes:

- Modify command used to enable CPU MF / HIS
- •COUNTER sets enabled

•SMF interval

```
HIS015I 11.32.39 DISPLAY HIS 056
         0025 ACTIVE
HIS
COMMAND: MODIFY HIS, B, TT='CMU MF COUNTERS ENABLED', CTRONLY, CTR=ALL, SI=
         SYNC
START TIME:
             2010/09/23 09:52:22
END TIME:
             ----/--/-- --:--:--
COMPLETION STATUS: -----
FILE PREFIX: SYSHIS20100923.095222.
COUNTER VERSION NUMBER 1: 1
                               COUNTER VERSION NUMBER 2: 2
COMMAND PARAMETER VALUES USED:
 TITLE= CMU MF COUNTERS ENABLED
 PATH=
 COUNTER SET= BASIC, PROBLEM-STATE, CRYPTO-ACTIVITY, EXTENDED
 DURATION= NOLIMIT
 CTRONLY
 STATECHANGE= SAVE
 SMFINTVAL= SYNC
```



### **Old Experiences**



CPU MF can help provide <u>cache/memory resource</u> change insights





z10 HiperDispatch attempts to align Logical CPs with PUs in the same Book







From CPU MF, z10 HiperDispatch=YES May Decrease the L2 Remote %



CPI - Cycles per Instruction

PRBSTATE - % Problem State

- L1MP Level 1 Miss Per 100 instructions
- L15P % sourced from L1.5 cache
- L2LP % sourced from Level 2 Local cache (on same book)
- L2RP % sourced from Level 2 Remote cache (on different book)

MEMP - % sourced from Memory

LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

Potential Workload Characterization z10 L1 sourcing from cache/memory hierarcy



### HiperDispatch=Yes Customer Improvement on z10 721

|       |      | Day      | Hour         | С | PI |             |              | Est Instr<br>Cmplx CPI | Est Finite<br>CPI | Est<br>SCPL1M | L1MP       | L15P | L2LP         | L2RP       |            | Rel Nest<br>Intensity | LPARCPU | HD ? |
|-------|------|----------|--------------|---|----|-------------|--------------|------------------------|-------------------|---------------|------------|------|--------------|------------|------------|-----------------------|---------|------|
|       |      | 12<br>11 | 11.0<br>11.0 |   |    | 8.2<br>7.5  | 53.7<br>52.8 | 3.79<br>3.74           |                   | 115<br>97     | 3.9<br>3.8 |      | 23.1<br>19.8 | 6.7<br>3.7 | 6.6<br>6.3 |                       |         |      |
| HD=Ye | es % | Impro    | vement       | t | 1  | . <b>10</b> | 1.02         | 1.01                   | 1.20              | 1.19          | 1.01       | 0.91 | 1.17         | 1.80       | 1.05       | 1.17                  | 1.07    |      |

**CPI – Cycles per Instruction** 

Prb State - % Problem State

Est Instr Cmplx CPI – Estimated Instruction Complexity CPI (infinite L1)

Est Finite CPI – Estimated CPI from Finite cache/memory

Est SCPL1M – Estimated Sourcing Cycles per Level 1 Miss

L1MP – Level 1 Miss per 100 instructions

L15P – % sourced from Level 2 cache

L2LP – % sourced from Level 2 Local cache (on same book)

L2RP – % sourced from Level 2 Remote cache (on different book)

**MEMP - % sourced from Memory** 

Rel Nest Intensity – Reflects distribution and latency of sourcing from shared caches and memory

LPARCPU - APPL% (GCPs, zAAPs, zIIPs) captured and uncaptured

### HiperDispatch=YES resulted in a ~ +10% improvement as measured by CPU MF.

Additional measurements over multiple days from traditional CPU/Transaction metrics should be used to validate HD=No Vs. Yes results

#### Partition has 21 logical processors

• 2 additional partitions on the CEC

### z196 Customer Benchmark - CICS Threadsafe Vs QR



- Comprises of CICS transactions and some Batch...
  - All Batch is heavy Update and running on both LPARs
  - The CICS transactions are cloned pairs. One group is left to run in QR mode and the other is marked threadsafe in the CICS PPT definition. This test Focused all the Quasi-Reentrant transactions in one LPAR and all the Threadsafe transactions in the other LPAR. Transaction concurrency was establish in order to drive the LPARs to 90%+ utilization levels.

#### Threadsafe Vs QR Results

- CICS 110s
  - Increase of 52% of transactions
  - Decrease of 42% in CPU per Transaction
  - Decrease of average response time by 67% (3.0x)
- RMF 72s CICS Storage Class
  - Ended Transactions up 2.4x
  - Response Time down 3.6x
- SMF 113s LPAR
  - CPI down 1.48x from 7.0 to 4.7
    - L1MP down by 1.5 from 8.5 to 7.0
    - L2P up 19.9% from 69.3% to 89.2%

CICS Threadsafe is an option that may help you reduce CPU cost for applicable transactions by reducing switches between different TCB types

CPU MF example to <u>supplement</u> CICS and RMF performance metrics

As a secondary data source to understand <u>why performance may have changed</u>

These numbers come from a synthetic Benchmark and do not represent a production workload

#### \*\*\* New - This is an evolving use of CPU MF \*\*\*

### CPU MF can help measure the impact of 1 MB Pages in your environment

|                         |      |          |           |            |        |      |       |      |      |      |           |         |     | ES    | st.  | ES    | t.              |               |        |
|-------------------------|------|----------|-----------|------------|--------|------|-------|------|------|------|-----------|---------|-----|-------|------|-------|-----------------|---------------|--------|
|                         |      |          |           |            |        |      |       |      |      |      |           |         |     | TLB1  | Miss | TLB1  |                 | PTE%          | of all |
|                         |      |          | Est Instr | Est Finite | Est    |      |       |      |      |      | Rel Nest  |         |     | CPU%  | of   | Cycle | es <sup>i</sup> | TLB1          |        |
| Test                    | CPI  | PRBSTATE | Cmplx     | CPI        | SCPL1M | L1MP | L15P  | L2LP | L2RP | MEMP | Intensity | LPARCPU | GHz | Total | CPU  | per N | liss            | Misse         | s      |
| DB2 V10 4K PageFix=YES  | 4.46 | 1.29     | 2.63      | 1.83       | 26     | 7.13 | 94.72 | 4.64 | 0.01 | 0.63 | 0.09      | 28.2    | 4.4 |       | 16.0 |       | 83              |               | 19.2   |
| DB2 V10 1MB PageFix=YES | 4.26 | 1.13     | 2.58      | 1.68       | 23     | 7.25 | 96.56 | 3.03 | 0.01 | 0.41 | 0.06      | 33.9    | 4.4 | - (   | 15.6 | ) (   | 65              | 1 (           | 13.7   |
|                         | 1.05 |          |           |            |        | 0.98 | 0.98  | 1.53 |      |      |           |         |     |       | 1.03 | 1     | 1.28            | $I \setminus$ | 1.40   |
|                         |      |          |           |            |        |      |       |      |      |      |           |         |     |       |      | /     | $\checkmark$    |               |        |

- DB2 10 for z/OS Beta provides ability to specify 1 MB Pages for DB2 Buffer Pools
- 1 MB Pages can help reduce TLB Page Table Entry Misses
- CPU MF can be used to help measure the 1 MB Page impact for your environment
  - DB2 10 for z/OS Beta Customer ran DB2 Batch job that exercised 4k and 1MB pages (PageFix=Yes). LFArea=40M
    - The batch job executed 30M Selects, 20M Inserts, and 10M Fetchs
  - CPU MF showed the following but this is not necessarily representative of 1 MB Page results
    - 40% reduction in Page Table Entry % (PTE) of all TLB1 Misses
    - 28% reduction Est. TLB1 Cycles per Miss, 3% reduction in Est.TLB1 Miss CPU% of Total CPU
    - Lower CPI and Nest Intensity
    - DB2 Accounting report showed 1.4 % reduction in CPU time

Warning: These numbers come from a synthetic Benchmark and do not represent a production workload

- As you implement 1 MB Page exploiters, use CPU MF to help measure the impact
  - Measure it in its intended Production LPAR
- See white paper "IBM System z10 Support for large pages"
  - http://www.research.ibm.com/journal/abstracts/rd/531/tzortzatos.html



### DB2 10 for z/OS Beta Customer – RMF for 1 MB Page

|                                                   |          |                               |                       |                    |                         |                           | P     | AGI    | NG    | AC    | τινι    | ТҮ    |                                  |                               |                            |                           | PAGE | 2            |
|---------------------------------------------------|----------|-------------------------------|-----------------------|--------------------|-------------------------|---------------------------|-------|--------|-------|-------|---------|-------|----------------------------------|-------------------------------|----------------------------|---------------------------|------|--------------|
| z/OS<br>T = IEAOPTXX                              | MOD      | E = ES                        | AME                   |                    | c                       | ENTRAL                    | STORA | GE MOV | EMEN" | T RAT | ES - IN | PAGES | FER SEC                          | COND                          | L 000.15.00<br>.000 SECOND |                           | PAGE | 2            |
| HIGH UIC (AVG)                                    | = 6      | S535<br>WRITT<br>ENTRAL       | (MA)<br>EN TO<br>STOR | x) =               | 65535<br>REAI<br>CENTRA | (MIN)<br>D FROM<br>L STOR | = 65  | \$35   | CEN   | TRAL  | STORAGE | FRAME | COUNTS                           |                               |                            |                           |      |              |
| HIPERSPACE R<br>PAGES                             |          |                               |                       |                    |                         |                           |       |        | 1     |       |         | 1     |                                  | 1                             |                            |                           |      |              |
| VIO R<br>PAGES                                    | т        |                               | 0.00                  |                    |                         | 0.00                      |       |        | 0     |       |         | 0     |                                  | 0                             |                            |                           |      |              |
|                                                   |          |                               |                       |                    |                         |                           |       | FRAME  | AND   | SLOT  | COUNTS  | ;     |                                  |                               |                            |                           |      |              |
|                                                   |          | CEN                           | TRAL :                | STOR               | AGE                     |                           |       |        |       |       |         |       |                                  | LOCAL PA                      | GE DATA SET                | SLOT COUNT                | rs   |              |
| (45, 5000, 55)                                    |          | MIN                           |                       | MAX                |                         | AVG                       |       |        |       |       |         |       |                                  |                               | MIN                        | MAX                       | A    | VG           |
| (15 SAMPLES)<br>AVAILABLE<br>SQA<br>LPA           |          | 8,574<br>0,497<br>5,734       | 10                    |                    | 10                      | , , , , ,                 |       |        |       |       |         |       | AVAILAB                          |                               |                            | 2,854,758<br>0            |      |              |
| CSA<br>LSQA                                       | з        | 9,739                         | 39                    | 921                | 39<br>15                | 850                       |       |        |       |       |         |       | NON-VIO                          | SLOTS                         | 757                        | 757                       | 7    | 57           |
| REGIONS+SWA<br>TOTAL FRAMES                       | 53<br>78 | 9,686<br>6,432<br>F           | 542<br>786<br>IXED    | 913<br>432<br>FRAM | 541<br>786<br>ES        | 822<br>432                |       |        |       |       |         |       |                                  | SHARE                         | D FRAMES AN                | 0<br>2,855,515<br>D SLOTS |      |              |
| NUCLEUS<br>SQA                                    |          | 2,608                         | 2                     | 608                | 2                       | . 608                     |       |        |       |       |         |       | CENTRAL                          | STORAGE                       | 6,428                      | 6,557                     | 6,4  | 89           |
| LPA                                               | 1        | 9,636<br>94<br>1,550<br>4,324 | 14                    | , 550<br>, 334     | 1<br>14                 | ,550<br>,331              |       |        |       |       |         |       | FIXED TO<br>FIXED BE<br>AUXILIAN | DTAL<br>ELOW 16 M<br>RY SLOTS | 98<br>0<br>0               | 98<br>0<br>8,518          |      | 98<br>0<br>0 |
| BELOW 16 MEG<br>BETWEEN 16M-2G<br>TOTAL FRAMES    | 1        | 77<br>3,456                   | 13                    | 77                 | 13                      | 77                        |       |        |       |       |         |       | IUTAL                            |                               | Y OBJECTS A                |                           | 0, 4 | 50           |
| TOTAL PRAMES                                      |          | STORAG                        |                       |                    |                         |                           |       |        |       |       |         |       | OBJECTS<br>SHARED                | COMMON                        | 3                          | 3                         |      | 3            |
| GETMAIN REQ<br>FRAMES BACKED                      |          | 0                             |                       |                    |                         |                           |       |        |       |       |         |       | FRAMES (<br>COMMON               | FIXED                         | 40<br>3,791<br>0           | 6<br>40<br>3,811<br>0     |      | 0            |
| FIX REQ < 2 GB<br>FRAMES < 2 GB<br>REF FAULTS 1ST |          | 000                           |                       |                    |                         |                           |       |        |       |       |         | <     | 1 MB                             |                               | 7,504<br>40                |                           | 112  | 90<br>40     |
| NON-1ST                                           |          | ő                             |                       |                    |                         |                           |       |        |       |       |         |       |                                  |                               |                            |                           |      |              |

Advanced Technical Skills

# Formulas – Additional TLB

| Metric – z10                                       | <b>Calculation</b> – note all fields are <b>deltas</b> between intervals |
|----------------------------------------------------|--------------------------------------------------------------------------|
| Est. TLB1 CPU Miss % of Total CPU                  | ( (E145+E146) / B0) * 100 * . <mark>31</mark> *                          |
| Estimated TLB1 Cycles per TLB Miss                 | (E145+E146) / (E138+E139) * . <mark>31</mark> *                          |
| PTE % of all TLB1 Misses                           | (E140 / (E138+E139)) * 100                                               |
|                                                    |                                                                          |
| Metric – z196                                      | <b>Calculation</b> – note all fields are <b>deltas</b> between intervals |
| Metric – z196<br>Est. TLB1 CPU Miss % of Total CPU |                                                                          |
|                                                    | between intervals                                                        |

#### Note these Formulas may change in the future

#### \* Updated March 2012

Est. TLB1 CPU Miss % of Total CPU - Estimated TLB CPU % of Total CPU B\* - E

Estimated TLB1 Cycles per TLB Miss - Estimated Cycles per TLB Miss

PTE % of all TLB1 Misses – Page Table Entry % misses

B\* - Basic Counter Set - Counter Number

See "The Set-Program-Parameter and CPU-Measurement Facilities" SA23-2260-0 for full description

E\* - Extended Counters - Counter Number

See "IBM The CPU-Measurement Facility Extended Counters Definition for z10" SA23-2261-0 for full description or "The CPU-Measurement Facility Extended Counters Definition for z10 and z196" SA23-2261-01 for full description

# Appendix

# CPU MF – Lessons Learned since August 2010

- CPU MF Metrics continue to help understand <u>why</u> performance changed
  - LPAR Configuration Changes including HD=Yes/No
  - CICS QR Vs Threadsafe
  - 1 MB Vs 4k Pages
  - GHz measurement for State Changes including Power Savings Mode
- Customers continue to successfully run CPU MF COUNTERS
  - Over days/months without any reported performance impact, Turning on and leaving on
  - Volunteer Feedback: easy to enable, minimal time investment
  - For long term gathering on z196s recommend CTR=(B,E)
- Update to z196 formulas for Estimated SCPL1M and Estimated Finite CPI see slide 27
- **z196** 
  - Ensure HD=YES
  - L3 off chip and off book sourced from respective L4s see formulas on slide 26





# CPU MF – Lessons Learned since March 2010

- CPU MF Performance Metrics continues to help understand <u>why</u> performance changed
  - LPAR Configuration Changes including
    - HD= Yes/No
  - 1 MB Vs 4k Pages
  - GHz measurement for State Changes
- Customers continue to successfully run CPU MF COUNTERS collecting SMF 113s
  - Over days/months without any reported performance impact, Turning on and leaving on
  - Volunteer Feedback: easy to enable, minimal time investment
- SMF 113 Logical CPU IDs are equal to the SMF 70 Logical CPU IDs
  - Directly identifies GCPs, zIIPs or zAAPs in SMF 113s with <u>APAR OA30486</u> for z10s and z196
- LPAR Management Time is NOT included in LPARCPU time (SMF 113 Cycles)
- Utilize the Counter Version Number fields to map to technology
  - SMF113\_2\_CTRVN2 Crypto or Extended counter sets = "2" for z196 "1" for z10
- z/VM CPU MF native prototype in process
- D HIS command provides useful status information

# CPU MF – Lessons Learned since August 2009

- CPU MF Performance Metrics can be used to help understand <u>why</u> performance changed
- Customers are successfully running CPU MF COUNTERS collecting SMF 113s
  - Over days and months without any reported performance impact
  - Feedback from Volunteers is this is very easy to enable, with a minimal time investment
- SMF 113 Logical CPU IDs are equal to the SMF 70 Logical CPU IDs
  - Can match up SMF 113s & SMF 70s to identify GCPs, zIIPs or zAAPs
  - Can see the unique Vertical Polarity Logical CPs cache/memory characteristics
    - E.G. Vertical Mediums may have higher L2 Remote activity
- In multi-book z10 ECs there can be L2 Remote Activity even if <=12 GCPs</p>
  - Because of I/O activity from SAPs as the data is initially stored in the Remote L2

#### Utilize the Counter Version Number fields to map to technology

- Number is increased for a change in meaning or number of counters
  - SMF113\_2\_CTRVN1 Basic or Problem-State counter sets
  - SMF113\_2\_CTRVN2 Crypto or Extended counter sets

### CPU MF Update – Lessons Learned since March 2009

- L1 Miss per 100 instructions can be determined from CPU MF COUNTERS
- z10 EC must be at bundle #20 or higher for CPU MF COUNTERS
- IRD considerations
  - If CPU goes offline, only activity within internal is recorded in an Intermediate record, then
    - If no activity in follow on 15 minute interval(s), Intermediate record is not cut for the CPUID
      - No Final record when HIS is ended
    - When activity resumes, Intermediate record is written for CPUID
- New APAR OA27623 to add "CPU Speed" to SMF 113 and to HIS COUNTERS output
  - Processor speed for which the hardware event counters are recorded. Speed is in cycles / microsecond - "4404" for z10 EC
  - SMF 113 new field: SMF113\_2\_CPSP 4 byte binary
  - Simplifies conversion of Cycles into "Time"
- Customers are successfully running CPU MF COUNTERS (and collecting SMF 113s) over 24 hours
- Analyze the "major" LPARs on a z10 at the same time



### Documentation

- MVS Commands SA22-7627-19
  - Setting up hardware event data collection 1-39
- The Set-Program-Parameter and CPU-Measurement Facilities SA23-2260-0
  - Full description of Basic, Problem-State and Crypto Counter Sets
- IBM The CPU-Measurement Facility Extended Counters Definition for z10 SA23-2261-0
- IBM The CPU-Measurement Facility Extended Counters Definition for z10 and z196 SA23-2261-01
- WSC Short Stories and Tall Tales
  - SHARE Summer 2009 Denver Session 2136 John Burg
- CPU MF Overview and WSC Experiences Techdoc TC000041 available March 26 2010
  - http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TC000041
  - SHARE Winter 2010 presentation and detailed write up for enabling CPU MF John Burg
- ITSO Red Book reference Planned for 4QT 2010
  - *Exploiting System z LPAR Capacity Controls* SG24-7846. 2 Part Book:
    - Part 1 CPU MF
    - Part 2 HiperDispatch, Group Capacity Controls, hard/soft capping
    - Draft available ~April 1

http://www.redbooks.ibm.com/redbooks.nsf/home?ReadForm&page=drafts