Some 25 years ago, I first released my game Cosmo Chicken and last updated it in 1998 to version 1.6. Earlier today, a quarter of a century later, version 1.7 was released. Why? Because it had a bug that made the game crash DOSBox when running in there. I doubt anybody will ever play the patched version, but it's a matter of pride for me to fix my bugs no matter how old they are and it was an opportunity for me to get back to basics and a good reason to write a nostalgic blog post as a postmortem.
What was the bug?
When run in certain versions of DOSBox (including the latest official version 0.74-3), whenever certain sounds would play, DOSBox would crash to the desktop. If you had the DOSBox terminal window up, you might see the error message "Exit to error: DMA segbound wrapping (read)"
Earlier versions, such as 0.73, and indeed later versions of DOSBox, if any are ever released, as well as DOSBox-X would not crash, nor would real hardware, but that doesn't mean that there wasn't a bug in Cosmo Chicken's original code. It's just a happy circumstance that it wasn't a critical problem on there.
Segment:Offset memory addressing
To understand what happened, we need to take a look at how memory is handled in the 8086 CPU (and because of that, in the so-called "real mode" of all CPUs in the x86 family). This was the era of the infamous 640k memory barrier. The first x86 CPU had a 20-bit address bus, allowing it to address a maximum of 1 megabyte (1048576 bytes). In hexadecimal, that is 0x100000, with the individual bytes being numbered linearly from 0x00000 to 0xFFFFF. However, the registers on the 8086 were only 16 bits in size, which leaves them 4 bits short of being able to point to any address in memory using a single register.
Thus, pointers had to be more than just 1 register and, the solution Intel chose was to have 2 registers forming a pointer. A segment register and an offset register. The segment registers were CS for the Code Segment, DS for the Data Segment, SS for the Stack Segment, and ES as an Extra Segment register. These registers were then coupled with an offset register to form pointers. CS:IP coupled the Code Segment and Instruction Pointer to point to the currently executing instruction, SS:SP coupled the Stack Segment and Stack Pointer to point to the top of the stack and generally, DS:DX would be used for reading and ES:BX for writing when transferring data around through memory, with the BX register allowing some slightly more advanced addressing modes.
These register pairs weren't just concatenated to form a linear address, which would have allowed for 32-bit pointers for a total of 4 gigabytes of memory. That's something that didn't show up until the introduction of the 32-bit 386 processor family. For the 8086, that would have been overkill. So, instead, they took the segment register, shifted it left by 4 bits, and then added the value of the offset register to obtain a linear address. For instance, when ES has the value 0x2409 and BX has the value 0x1978 the address calculation would be
0x24090 0x01978 ------- + 0x25A08
As the lower three nibbles of the segment register and the upper three nibbles of the offset register overlapped in the calculation, there were actually 4096 different combinations of segment:offset values that all mapped to the same linear address. For example, ES=0x2499 and BX=0x1078 or ES=0x2408 and BX=0x1988 would point to the same linear address of 0x24A08.
Now, the high nibble of the linear address (2, in this case) indicated that this address lay in a 64kB "page" numbered 2... that is, all the addresses ranging from 0x20000 to 0x2FFFF.
Sound Blaster DMA transfer
The original IBM PC was not very well equipped in the sound department, having a small speaker hooked up to a channel on the Intel 8253 PIT chip that would generate square waves at various frequencies and nothing else. Now, some enterprising souls figured out that, by rapidly (hundreds or thousands of times per second) changing the frequency of the generated square wave, they could generate far more complex sounds. This had to be done with very precise timing, though. As it all had to be done on that single CPU, it took a lot of the available processing power, leaving precious few cycles for actually running the game. Gamers wanted something better.
Image credit: Wdwd, CC BY-SA 3.0
In stepped Creative Labs with their Sound Blaster card. They took the Yamaha OPL2 chip that had previously been used on the AdLib Music Synthesizer Card and expanded on it by adding a channel for digitally sampled audio to the existing FM music synthesis. It's fair to say that this was a major revolution in PC gaming.
Now, to take the burden of reading the sound data from memory and outputting it over the DAC away from the CPU, the Sound Blaster used Direct Memory Access, or DMA for short. A chip called the Intel 8237 DMA controller would be set up to read from a certain area of memory and transfer the data onto a DMA channel. By default, the Sound Blaster would listen on DMA channel 1. As the DMA controller feeds the data to the DAC, the main CPU can go about its business and it will be informed of the transfer being completed by an interrupt request or IRQ. The default setting for early Sound Blasters was IRQ 7, but later models would default to IRQ 5.
This brings me to the root cause of the bug that crashed DOSBox. When setting up the DMA controller for DMA transfer, it did not take a standard segment:offset pointer for the source address. Instead, it took a page and an offset into that page. This is why I explained the x86 memory segmentation model before. Again using the aforementioned example address of 0x24A08, the DMA controller required that to be specified as offset 0x4A08 (a 16-bit word value) and page 0x02 (an 8-bit byte value). And, here is the crucial part: it didn't allow the DMA transfer to begin in one page and end in another.
Now, to be honest, I have no idea whether a DMA transfer straddling a page boundary caused any actual problems on real hardware, especially as by the time I wrote Cosmo Chicken, 386 and 486 machines were commonplace. Maybe they acted differently with more modern DMA controllers, or maybe they didn't. Perhaps the DAC would have garbage fed into it, which we'd hear as static. Perhaps (and I consider this the most likely scenario) the offset counter would wrap around from 0xFFFF to 0x0000 and this too could result in garbage being fed to the DAC. Perhaps the transfer would simply be aborted or not even started. And perhaps it would just work anyway. Whatever one of the faults it was, especially when the page boundary was overrun by just a few bytes, none of those scenarios would have been very noticeable if it wasn't an outright crash.
Update: there are actually some videos on YouTube of the game being played, a short video and a longer one that demonstrate that my guess of wrapping around seems to be the correct one, at least on whatever they used to record it. Note that in those recordings, the bouncy sound when hitting the landing pad at too high a velocity is cut off and the explosion sound is played. Thanks to Rikki Daman for bringing those videos to my attention.
So yeah, there you have it. As DOSBox's error message indicated, at least one sound effect was loaded into memory at a place straddling a page boundary and starting playback made DOSBox give up in utter terror to the point where the whole emulator crashed.
My game code did not make use of Turbo Pascal's memory allocator, preferring to use the DOS memory allocator instead. Explaining the details of the allocator would be well beyond the scope of this blog post, but for the most part, let's simplify it to DOS making allocations as sequentially as possible. Note: this bug would have occurred with the TP memory allocator too, just perhaps with a different sound - from the programme's point of view any given address handed out by either allocator may as well be random. But allocations happening one after the other with nothing being done in between (as commonly happens when loading data at game start) are not random. They tend to be sequential.
So, the fix I ended up implementing was that, if the block of memory allocated for a sound happens to straddle a page boundary, the newly allocated block is shrunk to end on that page boundary, but not released. Then, the allocation is retried. This most likely results in this second allocated block starting on the page boundary. Well, actually 16 bytes later, as there is an MCB header, but who's counting? It keeps doing this until an allocation succeeds and doesn't straddle the page boundary or there is no memory available. This ensures that the block that ends up being definitively allocated for the sound always falls within a memory page. At this point, it can also release the memory blocks that were held in reserve during this algorithm.
Problem solved... 25 years late.