system crashes types

sexta-feira, 6 de janeiro de 2012

There are essentially three types of system crashes:
  • High Priority Machine Check (HPMC): This is normally the result of a piece of hardware causing a Group 1 interrupt, an HPMC. A Group 1 interrupt is the highest priority interrupt the system can generate. Such an interrupt signifies THE MOST serious event has just occurred. The interrupt will be handled by a processor and passed to the operating system for it to process further. When the operating system receives an HPMC, the only thing it can do is to cause the system to crash. This will produce a system crashdump. As an example, a double-bit memory error will cause an HPMC. Many other hardware-related events will cause an HPMC. There is a small chance that an HPMC could be caused by a software error, but the vast majority of HPMCs are caused by hardware problems.
    There is also a Low Priority Machine Check (LPMC). An LPMC does not necessarily cause the system to crash. An LPMC may be related to a hardware error that is recoverable, e.g., a single-bit memory error.
  • Transfer of Control (TOC): If a system hangs, i.e., you can't get any response from a ping, from the system console, the system has frozen, and you may decide to initiate a TOC from the system console by using the TC command from the Command Menu (pressing ctrl-b on the console or via the GSP). If you are using Serviceguard, the cmcld daemon may cause the system to TOC in the event of a cluster reformation. All of these situations are normally associated with some form of software problem (the Serviceguard issue may be related to a hardware problem in our networking, but it was software that initiated the TOC).
  • PANIC: A PANIC occurs when the kernel detects a situation that makes no logical sense, e.g., kernel data structures becoming corrupted or logical corruption in a software subsystem such as a filesystem trying to delete a file twice (freeing free frag). In such situations, the kernel decides that the safest thing to do is to cause the system to crash. A PANIC is normally associated with a software problem, although it could be an underlying hardware problem (the filesystem problem mentioned above may have been caused by a faulty disk).
In summary, an HPMC is probably a hardware problem, and a TOC or PANIC is probably some form of software problem.