A Reply to DJ on Drivers

DJ laments there are no practical guides to driver writing.

DJ, allow me, as I so often do, to vocally disagree with you :). You seem to be making two closely related claims:

  1. There’s no information in OS textbooks on how to structure a driver to fit into any given operating system’s driver architecture
  2. There’s no information in OS textbooks on how drivers ought to interact with their hardware to make useful stuff happen

1. General operating systems textbooks are not the place to find this information, because if they were they wouldn’t be general operating systems textbooks.

Depth and breadth are usually a trade-off in books. If a book were so thorough and detailed in its treatment of Linux drivers that you could use it as your sole reference while writing one, it could not possibly do the same justice to drivers for Solaris, Free/Net/OpenBSD, AIX, Mac OS X/IOKit, Win32, WinNT, Vista, or any other operating system, and it surely can’t cover things like process scheduling, networking, virtual memory, filesystems, or any of those other things kernels sometimes do. The reason you read an operating systems book is to get a (necessarily) broad coverage of what all operating systems do.

If you want to get specific enough to one kernel and one subsystem that you can actually use it as a reference to build a driver, you probably want to put Modern Operating Systems back on the shelf and try Linux Device Drivers, a book focused on exactly what you see as lacking in general operating systems books.

Taking a mild divergence to head off a counterargument, it’s completely acceptable for the usual OS curriculum to exclude drivers, while including many other parts of operating systems, say, basics of concurrency. The difference in curriculum reflects the likelihood the average student will encounter the material during his career. The average programmer has very high odds of using concurrent systems sometime in his career, even more so today with the advent of multicore systems. Any more than the most basic concurrent programming (paranoid locks everywhere) involves some knowledge of what the OS is doing. In contrast, the average programmer will never be anywhere near driver code. Those who do write or maintain drivers take onto themselves responsibility to further research the field, the same way we expect those working with highly parallel systems to study a bit more than locks and semaphores.

2. I’m willing to accept that there’s little information available on actually driving hardware, but that’s because the information you’re looking for is very narrowly applicable, proprietary, distributed across the whole field, in rapid flux, and/or lacks any common principles or organization. Difficult to put that into a textbook with a ten- or twenty-year lifespan.

There’s so little information generally available about how to drive hardware because it’s intensely device-dependent. Once you get to the IO port level everything ends up pretty specific to the particular chip you’re talking to. Sure, ATA’s standard, so it’s not hard to drive a hard disk. But how about the ATA controller chip? How do you ask it to send a command and get the result? Not so standard.

The impression I get from the field (see disclaimer below) is that everybody kinda builds the hw/sw interface to be as simple as possible for the firmware and hardware guys to build. This varies fairly widely with different guys and different organizational knowledge. Firmware and hardware are extremely expensive and difficult to build correctly, and anything that can be simplified should be; anything that can be offloaded to (testable, debuggable, patchable) software should be. When everyone builds their hardware the way easiest for themselves and their organization and offloads everything else to software, everybody’s hardware will look completely different and will have completely different expectations of the driver’s role. Consequently, everybody’s drivers will look completely different.

A secondary reason for lack of standardization is that until recently most first-party drivers for mass-market hardware were closed-source. When your competitors’ specs are closed to you and your own spec is closed to them, standardization is just not possible. This is changing, though, and we’ll see what happens. There have also been notable exceptions, when an entire industry segment decided to stop being dumb and standardize. EHCI stands out as a shining example. (IDE/ATA/SATA as standard disk access might qualify, though they have their origins in ancient disk protocols defined by the operating system, not the disk vendor, and any disk had to meet that protocol or you couldn’t use it. Standardization was imposed by the OS, not agreed upon by the vendors)

In such an environment, OS books can’t really get any more specific than “Here’s how you read from or write to an I/O port” for the same reason they can’t really get any more specific than “Here’s how you read or write to a network socket”. To tell you how to use, say, FTP, is to tie themselves to today’s practice and take up paper and author’s time that could be better devoted to general principles of networking. They probably ought to describe PIO and DMA, for the same reason they describe IP and TCP. But the OS books are and should be mute on how to drive the hardware. Those rules are different between every chipmaker and every chip, and change every year and every rev. They belong in datasheets and reference manuals, not general textbooks.

Conclusion

For now, every operating system has a different driver architecture, so a general operating systems textbook can’t and shouldn’t devote too much attention to any given system. Furthermore, every piece of hardware has a different host interface, so a general operating systems textbook can’t and a driver reference book sometimes shouldn’t devote too much attention to any given piece of hardware.

The way to change this is for different kernels to agree on a driver architecture, or for different hardware makers to agree on a host interface. The first is not likely to change anytime soon; Linus is not about to chuck the bottom quarter of 2.6, Apple’s kernel team hasn’t shown signs of life lately, Sun still hates everyone, and Microsoft is still dumb.

What is changing is that host interfaces are standardizing. USB was a great leap forward in this regard. Other than the UHCI/OHCI misstep, EHCI means anybody’s host adapter works the same as any others, and device classes mean anybody’s device works the same as nay others. No more does every serial and parallel widget in the store need its own driver. Disk interfaces have long been standardized (out of necessity). I thought Ethernet NICs have a broadly-supported standard, but I can’t track it down.

Disclaimer: I’m very new to hardware design, I’ve never written a driver, and I’ve only been through the first-level undergrad operating systems course. I calls ‘em as I sees ‘em, but I may not see ‘em very well from where I stand. I welcome discussion.

One Response to “A Reply to DJ on Drivers”

  1. Rick Auricchio says:

    Joe:

    This is too much trouble to type into the comment field on your blog. Feel free to post it however you wish. [posted here -joe]

    — Standardization —

    I’ve worked on a driver standardization committee: Taligent’s proposed Object-Oriented Device Driver Model (OODDM), in 1995. While the participating companies agreed in general terms that drivers have common OS interfaces and needs, we soon found that each company wanted to differentiate their system from the others.

    Sun wanted to add functions so that the drivers could work better on their systems. But how can one write a standard driver that uses Sun Feature X and not need that feature on Apple systems? What about Apple Feature Z that isn’t on a Sun system?

    A colleague not involved in the discussions (Eryk Vershen of Apple) said “Standardization removes an area of competition.” This quote has stuck with me since he said it back then. The entire standardization effort failed, because nobody wanted to limit his system to the least-common-denominator driver capabilities.

    — Hardware —

    Everybody builds the cheapest chips possible, especially computer vendors who have control of drivers and the OS. Virtually no Wintel vendor varies from the commonly available chipsets, because if they do something wrong, Microsoft will simply laugh at them. Apple, Sun, and other computer/OS vendors, however, are free to build hardware as they see fit. So their hardware has no equivalent in the other vendor’s computers.

    — My usual rant on Apple’s constantly changing hardware —

    Apple has consistently changed its I/O devices and controllers. When Macs had built-in SCSI hardware, the controller chips changed almost with every model of Macintosh. Often the hardware designers chose an off-the-shelf part, but on a few occasions the SCSI controller was designed in-house. With each change of chip, the driver had to change. Other changes included the addition of various direct-memory-access (DMA) circuits and control chips; these DMA schemes also varied with the computer model.

    The SCSI subsystem thus had to support many permutations of controller chip and DMA scheme, and, at times, a completely different interrupt circuit.

    — Ignorant Hardware Designers —

    They weren’t stupid, but they didn’t understand software. A constant source of irritation to driver authors was the inability to read settings from control registers. Why should I have to keep a shadow copy of the register bits in memory? In other cases, reading a register changed a status situation. (For example, reading an input-data register on a serial chip would clear the data-ready status bit. Reading an interrupt controller’s status register cleared all pending interrupts.) Reading of any register should never alter its contents nor change state; let the programmer explicitly use a write to cause changes.

    — Bored Hardware Designers—

    It often seemed that the hardware designers couldn’t leave well enough alone. What was wrong with the SCSI chip in the prior model? Just because you’re bored—or too lazy to ask anyone—it’s no excuse to change chips. Is the new chip that much cheaper than the old?

    — Time Constraints —

    All too often, the hardware designers would create a design without consulting driver authors, then “throw the spec over the wall” to them. “Here’s the spec for the SCSI controller. Oh, by the way, we’re going to silicon in four weeks.” So the software developer must read the spec; ask for clarification of typos; ask for further explanation; visualize writing a driver; consider all error conditions and recovery from those conditions. Then, when the inevitable problems arise with the design, the software developer must attempt to convince the hardware guys to change the design—when they were thought they were done, all set to send it out for manufacturing.

    — The Costs Nobody Saw —

    Nobody at Apple seemed to realize that the so-called cost savings on the hardware side were eaten up on the software side. Saving two dollars on a quarter-million computers is a half-million dollars. But three software engineers working for six months eats up much of that money. Worse, when the system ships behind schedule, marketing and advertising costs eat the rest. In some cases, missing an important sales deadline can cost far more: consider the cost of missing a holiday season or a school purchasing period.

    Another cost, though not an immediate cash cost, was the lack of progress in the operating system and its drivers. How could one work on innovative audio and video when the disk drive wouldn’t even work reliably? There was no time to create fancy devices and drivers when the mundane took too much time to get right.

    Had someone in a high position looked at these savings vs. additional costs, they’d have stopped the hardware changes.

Leave a Reply


Or, enter your OpenID URL to log in: (cookies required)

Just another WordPress weblog