Contenues dans
Trouver plus de documentationRessources d'assistance comprises | Télécharger cet ouvrage au format PDF (187 Ko)
Chapter 3 CPU Over-Temperature SafeguardThe CPU over-temperature safeguard (COS) is a Sun Enterprise xx00 platform feature for the Solaris 2.6 software environment and compatible versions available for servers with the proper firmware support. COS ensures that the temperature on any CPU/memory board does not exceed the safe operating range. COS RequirementsCOS is not available if a Sun Enterprise xx00 server lacks enabling firmware. In this case, the system displays the following messages during the boot sequence: WARNING: Firmware does not support CPU power off WARNING: Automatic CPU shutdown on over-temperature disabled WARNING: Firmware does not support CPU restart from power off WARNING: The ability to restart individual CPUs is disabled When equipped with the proper firmware, the system displays the following during the boot sequence. Later firmware will show a similar output.. Board 0: OBP 3.2.8 1997/02/27 14:00 POST 3.5.1 1997/03/05 09:34
Factors in OverheatingMany external conditions can raise the CPU/memory board temperature and compound high temperature problems, including:
Some Solaris software environment issues can also affect the CPU temperature, such as bound threads or having only one CPU/memory board in the system. These Solaris software environment issues can cause a fallback to the existing shutdown behavior. The CPU over-temperature safeguard does not affect the Solaris software environment in any way. COS operates only when the temperature of a CPU/memory board exceeds the safe operating range. COS OperationCOS functions by monitoring the temperatures of all system CPUs. Warning messages are displayed in the system console if a CPU/memory board over-temperature condition occurs. The following example indicates an over-temperature condition for CPU/memory board 0: WARNING: CPU/Memory board 0 is warm (temperature: 73C). Please check system cooling NOTICE: Processor 0 powered off. NOTICE: Processor 1 powered off. Resolving an Over-Temperature ConditionWhen the COS feature detects a CPU over-temperature condition, it takes the CPU offline and powers it off. The system continues to operate with the offending CPU powered off. The CPUs are the chief source of heat on a CPU/Memory board; removing that heat source lowers the temperature into the normal operating range. This prevents sudden down time to the production server. To Resolve an Over-Temperature Condition
Failure to Disengage CPUsIn some instances, the CPU power control cannot disengage the affected CPU(s) from the Solaris software environment. For example, if the high temperature condition occurs when only one CPU/memory board with two processors is in the system, processor one will not go offline due to its being the last processor in the system. Failure to Power Off CPUsIf the attempted de-coupling of the problem CPU from the Solaris software environment fails, the temperature may continue to increase. When the temperature reaches the hard upper operational temperature limit, the system shuts down. In this case, a message similar to the following is displayed: WARNING: CPU/Memory board 0 is very hot (temperature: 83C) WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on CPU/Memory board 0 WARNING: CPU/Memory board 0 still too hot (temperature: 83C). Overtemp shutdown started |