AMX Rack Rail Kit MMS Servers Specifications download pdf (Page 120)

4405ch04 Continuous availability and manageability.fmDraft Document for Review September 2, 2008 5:05 pm

106 IBM Power 570 Technical Overview and Introduction

non-critical error is detected or if the error occurs in a resource that can be removed

from the system configuration, the booting process is designed to proceed to

completion. The errors are logged in the system nonvolatile random access memory

(NVRAM). When the operating system completes booting, the information is passed

from the NVRAM into the system error log where it is analyzed by error log analysis

(ELA) routines. Appropriate actions are taken to report the boot time error for

subsequent service if required.

One important Service Processor improvement allows the system administrator or service

representative dynamic access to the Advanced Systems Management Interface (ASMI)

menus. In previous generations of servers, these menus were only accessible when the

system was in standby power mode. Now, the menus are available from any Web

browser-enabled console attached to the Ethernet service network concurrent with normal

system operation. A user with the proper access authority and credentials can now

dynamically modify service defaults, interrogate Service Processor progress and error logs,

set and reset guiding light LEDs, indeed, access all Service Processor functions without

having to power-down the system to the standby state.

The Service Processor also manages the interfaces for connecting Uninterruptible Power

Source (UPS) systems to the POWER6 processor-based systems, performing Timed

Power-On (TPO) sequences, and interfacing with the power and cooling subsystem.

Error checkers

IBM POWER6 processor-based systems contain specialized hardware detection circuitry that

is used to detect erroneous hardware operations. Error checking hardware ranges from parity

error detection coupled with processor instruction retry and bus retry, to ECC correction on

caches and system buses. All IBM hardware error checkers have distinct attributes:

򐂰 Continually monitoring system operations to detect potential calculation errors.

򐂰 Attempt to isolate physical faults based on run-time detection of each unique failure.

򐂰 Ability to initiate a wide variety of recovery mechanisms designed to correct the problem.

The POWER6 processor-based systems include extensive hardware and firmware

recovery logic.

Fault Isolation Registers

Error checker signals are captured and stored in hardware Fault Isolation Registers (FIRs).

The associated Who’s on First logic circuitry is used to limit the domain of an error to the first

checker that encounters the error. In this way, run-time error diagnostics can be deterministic

such that for every check station, the unique error domain for that checker is defined and

documented. Ultimately, the error domain becomes the Field Replaceable Unit (FRU) call,

and manual interpretation of the data is not normally required.

First Failure Data Capture (FFDC)

First Failure Data Capture (FFDC) is an error isolation technique that ensures that when a

fault is detected in a system through error checkers or other types of detection methods, the

root cause of the fault will be captured without the need to recreate the problem or run an

extended tracing or diagnostics program.

For the vast majority of faults, a good FFDC design means that the root cause will be

detected automatically without intervention of a service representative. Pertinent error data

related to the fault is captured and saved for analysis. In hardware, FFDC data is collected

from the fault isolation registers and ‘Who’s On First’ logic. In Firmware, this data consists of

return codes, function calls, etc.

1 ... 119 120 121 ... 142

Comments to this Manuals

No comments

AMX Rack Rail Kit MMS Servers Specifications Page 120

Comments to this Manuals

Related products and manuals for Servers AMX Rack Rail Kit MMS Servers