9807: Detecting and Diagnosing Problems When z/OS Thinks It Is OK (includes PFA User Experience)

Thursday, August 11, 2011: 11:00 AM-12:15 PM
Southern Hemisphere 3 (Walt Disney World Dolphin )
Speakers: Robert Abrams (IBM Corporation) and Sam Knutson (GEICO)
Handouts
  • SoftFailureDet_SHAREOrlando5.pdf (961.3 kB)
  • Detecting & avoiding problems when z/OS "thinks" it is running okay (1.3 MB)
  • The presenter will discuss the multiple capabilities which are available on z/OS to detect and diagnose soft failures

    • Describe soft failure detection
      • Built into z/OS component like XCF stalled member detection
      • Provided by health checks
      • Provided by z/OS PFA
      • Provided by other vendor products
    • Highlight the kind of problems each different type of soft failure detection is good at and not good at
      • Machine time scale vs human time scale
      • Location in the stack
      • Detectable by performance metrics vs non performance metrics
    • Insight from building PFA to help reduce impact of soft failures
      • Automation of alerts is key
      • z/OS can survive / recover from most soft failures
      • Most metrics are very time sensitive

    Tracks: z/OS Systems Programming
    Share |




    See more of Project: MVS Core Technologies
    See more of Program: MVS