John Graham: “A bad workman blames his tools”:
1. Find the smallest possible test case that tickles the bug. The aim is to find the smallest and fastest way to reproduce the bug reliably. With heisenbugs this can be hard, but even a fast way to reproduce it some percentage of the time is valuable.
2. Automate that test case. It’s best if the test case can be automated so that it can be run again and again. This also means that the test case can become part of your program’s test suite once the bug is eliminated. This’ll stop it coming back.
3. Debug until you find the root cause. The root cause is vital. Unless you fully understand why the bug occurred you can’t be sure that you’ve actually fixed it. It’s very easy to get fooled with heisenbugs into thinking that you’ve eliminated them, when all you’ve done is covered them up.
4. Fix it and verify using #2.
Read the entire piece.