(no subject)

Date: 2013-10-29 06:33 pm (UTC)
jducoeur: (Default)
From: [personal profile] jducoeur
My worst-ever bug was while working at Looking Glass, and was kind of an "anti-Heisenbug". It was the legendary "clock slowing down" bug. The symptoms were very simple and clear, but incredibly bizarre: after playing the game, over the course of the next week or so, you'd begin to lose time from the Windows system clock. If left long enough, the system would slowly grind to a near-halt (to the point where even mouse movement became sluggish) -- despite the Windows Task Manager claiming that the CPU wasn't doing *anything*.

I lost over a *month* to that goddamn bug, and never did get Microsoft to admit that it was their fault, but my online research basically came to the conclusion that I'd hit a bug deep inside the OS, specifically in the sound libraries.

I don't remember the full details, but when you started using this particular media library, it would begin a timer in the kernel that would occasionally wake up, scan an internal linked list, pop the first thing from the list and add a new one to the end. Problem was, if you killed the process without properly closing that library (because the program crashed or -- as happened all the time for us -- if you stopped it in the debugger), it would stop *removing* things from the list, but would keep *adding* to it. This only happened a relatively few times per second, but eventually added up to scanning a linear linked list that was *millions* of elements long, several times a second. And of course, since it was in the kernel, the OS didn't think anything at all was happening, but it would begin to miss OS events like mouse movement and clock ticks since it was spending all its time scanning the list.

It was an "anti-Heisenbug" in that, as I eventually figured out, it *only* happened if you were debugging. Fixing it in the shipping product was trivial: I just made sure that we had an outer exception catch, and always shut down that damned Windows library properly. But even if we hadn't fixed it, I suspect it would never have been noticed in the field. It was aggravating to realize that the bug only really showed up when debugging, and that the only solution was, really and truly, to just reboot eff'ing Windows every now and then.

(Of course, this was around the time that we found out that it was physically impossible to run Windows for more than two months before an internal timer would roll over and crash the OS. Far as we could tell, nobody had ever hit this because it was nearly impossible to successfully run Windows for that long...)
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

jducoeur: (Default)
jducoeur

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags