Troubleshooting principles
Overview
This article will walk you through best practice methods to follow while troubleshooting.
Many jobs revolve around preventative and, unfortunately, recovery practices due to the chaotic nature of technology.
How to troubleshoot your system
Troubleshooting can be summarized in six steps:
- Backup your system and files
- Identify the issue(s)
- Reproduce the issue(s)
- Gather information
- Note the environment the problem occurred in
- Capture any error messages
- Run tests to isolate the variable causing the problem
- Keep a log of the problem and resolution in case issue recurs
The following steps will guide you through finding a resolution to your issue.
1. Backup your system and files
Before you begin troubleshooting you should always backup your files. Troubleshooting can make things worse instead of better and that means you're working backwards. Troubleshooting is commonly composed of trial and error testing that may cause more problems than you initially suffered from. Reverting back to the initial state is integral in saving time and effort.
While development work environments are recommended for all systems, they are essentially mandatory for any production system that has limited to no downtime, and/or any system that requires approval prior to any changes made.
2. Gather information and identify the issue(s)
The next step in troubleshooting is to identify exactly what isn't working as expected. Environment variables should be gathered on the system that is being troubleshooted.
Common information to gather when troubleshooting a Windows environment:
- What has recently been changed?
- What is the Windows version and service pack?
- Which users are affected?
- If it's an issue with a device, gather information about the device, like make and model, driver version, and whether the issue reproduces when moving the device to another computer.
3. Reproduce the issue(s)
It is crucial to repeat the exact steps taken to produce the issue(s) because the problem may be due to a specific order of events. It is often useful to retrace your steps. Without reproducing an issue it is very difficult and probably unlikely you will find a resolution. Once you can reproduce it, you will have a greater understanding of the problem.
Error messages
Capture useful error messages that may pinpoint the course of action needed to resolve the issue. We recommend taking a screenshot of any error message and saving a copy for reference.
5. Run tests, one at a time, to isolate the variable causing the issue
Try the simple fixes first. Check that cables are properly plugged in, check if any basic settings have been changed, and try turning it off and back on again.
If you've exhausted the simple fixes, it's time to isolate the variable that actually caused the problem. You do this by performing tests changing only one thing at a time. Then, when you finally find the cause, you can rollback any changes you made that had no effect on the issue at hand.
If you're having trouble coming up with new things to test, then find someone to talk to. Talking it out or explaining it to a random individual can be quite helpful by forcing you to walk through the issue step by step rather than taking anything for granted.
6. Keep a log of the problem and the resolution in case the issue reoccurs
It is always good practice to keep a log of problems and their resolutions in case they show up again. Then you may be able to apply a quick solution rather than walk through the testing phase again.