Thursday, August 11, 2011: 11:00 AM-12:15 PM
Southern Hemisphere 3 (Walt Disney World Dolphin )
Speakers:
Robert Abrams (IBM Corporation)
and
Sam Knutson (GEICO)
The presenter will discuss the multiple capabilities which are available on z/OS to detect and diagnose soft failures
- Describe soft failure detection
- Built into z/OS component like XCF stalled member detection
- Provided by health checks
- Provided by z/OS PFA
- Provided by other vendor products
- Highlight the kind of problems each different type of soft failure detection is good at and not good at
- Machine time scale vs human time scale
- Location in the stack
- Detectable by performance metrics vs non performance metrics
- Insight from building PFA to help reduce impact of soft failures
- Automation of alerts is key
- z/OS can survive / recover from most soft failures
- Most metrics are very time sensitive
Tracks: z/OS Systems Programming