Structured Troubleshooting Methods (36.1.4)
There are several structured troubleshooting approaches that can be used. Which one to use will depend on the situation. Each approach has its advantages and disadvantages. This section describes methods and provides guidelines for choosing the best method for a specific situation.
Bottom-Up
In bottom-up troubleshooting, you start with the physical layer and the physical components of the network, as shown in Figure 36-1, and move up through the layers of the OSI model until the cause of the problem is identified.
Figure 36-1 Bottom-Up Troubleshooting
Bottom-up troubleshooting is a good approach to use when the problem is suspected to be a physical one. Most networking problems reside at the lower levels, so implementing the bottom-up approach is often effective.
The disadvantage with the bottom-up troubleshooting approach is it requires that you check every device and interface on the network until the possible cause of the problem is found. Remember that each conclusion and possibility must be documented, so there can be a lot of paperwork associated with this approach. A further challenge is to determine which devices to start examining first.
Top-Down
As shown in Figure 36-2, top-down troubleshooting starts with the end-user applications and moves down through the layers of the OSI model until the cause of the problem has been identified.
Figure 36-2 Top-Down Troubleshooting
End-user applications of an end system are tested before tackling the more specific networking pieces. Use this approach for simpler problems, or when you think the problem is with a piece of software.
The disadvantage with the top-down approach is it requires checking every network application until the possible cause of the problem is found. Each conclusion and possibility must be documented. The challenge is to determine which application to start examining first.
Divide-and-Conquer
Figure 36-3 shows the divide-and-conquer approach to troubleshooting a networking problem. The network administrator selects a layer and tests in both directions from that layer.
Figure 36-3 Divide-and-Conquer Troubleshooting
In divide-and-conquer troubleshooting, you start by collecting user experiences of the problem, document the symptoms, and then, using that information, make an informed guess as to which OSI layer to start your investigation. When a layer is verified to be functioning properly, it can be assumed that the layers below it are functioning. The administrator can work up the OSI layers. If an OSI layer is not functioning properly, the administrator can work down the OSI layer model.
For example, if users cannot access the web server, but they can ping the server, then the problem is above Layer 3. If pinging the server is unsuccessful, then the problem is likely at a lower OSI layer.
This is one of the most basic troubleshooting techniques. The approach first discovers the traffic path all the way from source to destination. The scope of troubleshooting is reduced to just the links and devices that are in the forwarding path. The objective is to eliminate the links and devices that are irrelevant to the troubleshooting task at hand. This approach usually complements one of the other approaches.
This approach is also called swap-the-component because you physically swap the problematic device with a known, working one. If the problem is fixed, then the problem is with the removed device. If the problem remains, then the cause may be elsewhere.
In specific situations, this can be an ideal method for quick problem resolution, such as with a critical single point of failure. For example, a border router goes down. It may be more beneficial to simply replace the device and restore service rather than to troubleshoot the issue.
If the problem lies within multiple devices, it may not be possible to correctly isolate the problem.
This approach is also called the spot-the-differences approach and attempts to resolve the problem by changing the nonoperational elements to be consistent with the working ones. You compare configurations, software versions, hardware, or other device properties, links, or processes between working and nonworking situations and spot significant differences between them.
The weakness of this method is that it might lead to a working solution, without clearly revealing the root cause of the problem.
This approach is also called the shoot-from-the-hip troubleshooting approach. This is a less-structured troubleshooting method that uses an educated guess based on the symptoms of the problem. Success of this method varies based on your troubleshooting experience and ability. Seasoned technicians are more successful because they can rely on their extensive knowledge and experience to decisively isolate and solve network issues. With a less-experienced network administrator, this troubleshooting method may be too random to be effective.