Friday, December 22, 2006

BH #2 - Daring DMA's

Direct memory access (DMA) is a feature of modern computers, that allows certain hardware subsystems to access system memory directly, for reading and/or writing, independently of the main processor. Many hardware systems use DMA including disk drive controllers, graphics cards, network cards, and sound cards. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without a DMA channel.


Without DMA, the CPU typically will be occupied for the entire time its performing a data transfer. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller when the operation has been completed. This is useful in real-time computing applications where not stalling behind concurrent operations is critical.


DMA is an essential feature of all modern computers. It allows devices to transfer data without subjecting the CPU to a heavy overhead. Otherwise, the CPU would have to copy each piece of data from the source to the destination. This is slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is slower than normal system RAM. During this time the CPU would be unavailable for any other tasks involving CPU bus access, although it could continue doing any work which did not require bus access.


A DMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute it. For so-called "third party" DMA, as is normally used with the ISA bus, the transfer is performed by a DMA controller which is typically part of the motherboard chipset. More advanced bus designs such as PCI usually use bus mastering DMA, where the device takes control of the bus and performs the transfer itself.


A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. These types of operations will not stall the processor, and can be scheduled to perform other tasks. DMA transfers are essential to high performance embedded systems. It is also essential in providing zero-copy implementations of peripheral device drivers as well as functionalities such as network packet routing, audio playback and streaming video.


DMA Engines

In addition to hardware interaction, DMA can also be used to offload expensive memory operations, such as large copies or scatter-gather operations, from the CPU to a dedicated DMA engine. While normal memory copies are typically too small to be worthwhile to offload on today's desktop computers, they are frequently offloaded on embedded devices due to more limited resources. Newer Intel Xeon processors also include a DMA engine technology called I/OAT, meant to improve network performance with high-throughput network interfaces, such as gigabit Ethernet devices, in particular. However, benchmarks with this approach on Linux indicate no more than 10% improvement in CPU utilization.



Examples

ISA For example, a PC's ISA DMA controller has 16 DMA channels of which 7 are available for use by the PC's CPU. Each DMA channel has associated with it a 16-bit address register and a 16-bit count register. To initiate a data transfer the device driver sets up the DMA channel's address and count registers together with the direction of the data transfer, read or write. It then instructs the DMA hardware to begin the transfer. When the transfer is complete, the device interrupts the CPU.


Scatter-gather DMA allows the transfer of data to and from multiple memory areas in a single DMA transaction. It is equivalent to the chaining together of multiple simple DMA requests. Again, the motivation is to off-load multiple input/output interrupt and data copy tasks from the CPU.


DRQ stands for DMA request; DACK for DMA acknowledge. These symbols are generally seen on hardware schematics of computer systems with DMA functionality. They represent electronic signaling lines between the CPU and DMA controller.


In programmed I/O mode for hard disks, that method of transferring data between the hard disk and the rest of the system has a serious flaw: it requires a fair bit of overhead, as well as the care and attention of the system's CPU. Clearly, a better solution is to take the CPU out of the picture entirely, and have the hard disk and system memory communicate directly. Direct memory access or DMA is the generic term used to refer to a transfer protocol where a peripheral device transfers information directly to or from memory, without the system processor being required to perform the transaction. DMA has been used on the PC for years over the ISA bus, for devices like sound cards and the floppy disk interface. Conventional DMA uses regular DMA channels which are a standard system resource. Several different DMA modes have been defined for the IDE/ATA interface; they are grouped into two categories. The first set of modes are single word DMA modes. When these modes are used, each transfer moves just a single word of data (a word is the techie term for two bytes, and recall that the IDE/ATA interface is 16 bits wide). There are (or were!) three single word DMA modes, all defined in the original ATA standard:


DMA Mode

Cycle Time (nanoseconds)

Maximum Transfer Rate (MB/s)

Defining Standard

Single Word Mode 0

960

2.1

ATA

Single Word Mode 1

480

4.2

ATA

Single Word Mode 2

240

8.3

ATA


(Maximum transfer rate is double the reciprocal of the specific cycle time for each mode.) Obviously, these are not impressive transfer rate numbers by today's standards. Performing transfers of a single word at a time is horribly inefficient -- each and every transfer requires overhead to set up the transfer. For that reason, single word DMA modes were quickly supplanted by multiword DMA modes. As the name implies, under these modes a "burst" of transfers occurs in rapid succession, one word after the other, saving the overhead of setting up a separate transfer for each word. Here are the multiword DMA transfer modes:


DMA Mode

Cycle Time (nanoseconds)

Maximum Transfer Rate (MB/s)

Defining Standard

Multiword
Mode 0

480

4.2

ATA

Multiword
Mode 1

150

13.3

ATA-2

Multiword
Mode 2

120

16.7

ATA-2


Since multiword DMA transfers are more efficient, and also have higher maximum transfer rates, single word DMA modes were quickly abandoned after ATA-2 was widely adopted -- they were actually removed from the ATA standards in ATA-3. So all DMA accesses today (including Ultra DMA) are actually multiword; the term "multiword" is now often assumed and no longer specifically mentioned.


Another important issue with DMA is that there are in fact two different ways of doing DMA transfers. Conventional DMA is what is called third-party DMA, which means that the DMA controllers on the motherboard coordinate the DMA transfers. (The "third party" is the DMA controller.) Unfortunately, these DMA controllers are old and very slow -- they are basically unchanged since the earliest days of the PC. They are also pretty much tied to the old ISA bus, which was abandoned for hard disk interfaces for performance reasons. When multiword DMA modes 1 and 2 began to become popular, so did the use of the high-speed PCI bus for IDE/ATA controller cards. At that point, the old way of doing DMA transfers had to be changed.


Modern IDE/ATA hard disks use first-party DMA transfers. The term "first party" means that the peripheral device itself does the work of transferring data to and from memory, with no external DMA controller involved. This is also called bus mastering, because when such transfers are occurring the device becomes the "master of the bus". Bus mastering allows the hard disk and memory to work without relying on the old DMA controller built into the system, or needing any support from the CPU. It requires the use of the PCI bus -- older buses like MCA also supported bus mastering but are no longer in common use. Bus-mastering DMA allows for the efficient transfer of data to and from the hard disk and system memory. Bus mastering DMA keeps CPU utilization low, which is the amount of work the CPU must do during a transfer.


Interestingly, despite the obvious advantages of bus mastering DMA, the use of bus-mastering multiword DMA mode 2 never really caught on. There are several reasons for this. The most important was the poor state of support for the technology for the first couple of years. Using PIO required no work and was very simple; DMA was not supported by the first version of Windows 95, so special drivers had to be used. Problems with implementing bus mastering DMA on systems in the 1996 to 1998 time frame were numerous: issues with buggy drivers, software the didn't work properly, CD-ROM drives that wouldn't work with the drivers, and so on. In the face of these problems, DMA didn't offer much incentive to make the switch. Sure, the lower CPU utilization was good, but since the maximum DMA mode's speed was the same as that of the highest PIO mode (16.7 MB/s) there wasn't a great perception that DMA offered much of an advantage over PIO. Given little upside potential, many people stayed away from using DMA, to avoid the compatibility and stability problems that sometimes resulted.


Bus mastering DMA finally came into its own when the industry moved on to Ultra DMA. Once Ultra DMA/33 doubled the interface transfer rate, DMA had an obvious speed advantage over PIO in addition to its other efficiency improvements. Support for DMA was also cleaned up and made native in Windows 9x, and most of the problems with the old drivers were eliminated. Today, the use of Ultra DMA is the standard in the industry.


Ultra DMA (UDMA) Modes

With the increase in performance of hard disks over the last few years, the use of programmed I/O modes became a hindrance to performance. As a result, focus was placed on the use of direct memory access (DMA) modes. In particular, bus mastering DMA on the PCI bus became mainstream due to its efficiency advantages. If you have not yet, you should read the description of the various DMA modes and how bus mastering DMA works; this will help you understand this page much better.


Of course, hard disks get faster and faster, and the maximum speed of multiword DMA mode 2, 16.7 MB/s, quickly became insufficient for the fastest drives. However, the engineers who went to work to speed up the interface discovered that this was no simple task. The IDE/ATA interface, and the flat ribbon cable it used, were designed for slow data transfer--about 5 MB/s. Increasing the speed of the interface (by reducing the cycle time) caused all sorts of signaling problems related to interference. So instead of making the interface run faster, a different approach had to be taken: improving the efficiency of the interface itself. The result was the creation of a new type of DMA transfer modes, which were called Ultra DMA modes.


The key technological advance introduced to IDE/ATA in Ultra DMA was double transition clocking. Before Ultra DMA, one transfer of data occurred on each clock cycle, triggered by the rising edge of the interface clock, or "strobe". With Ultra DMA, data is transferred on both the rising and falling edges of the clock. Double transition clocking, along with some other minor changes made to the signaling technique to improve efficiency, allowed the data throughput of the interface to be doubled for any given clock speed. In order to improve the integrity of this now faster interface, Ultra DMA also introduced the use of cyclical redundancy checking or CRC on the interface. The device sending data uses the CRC algorithm to calculate redundant information from each block of data sent over the interface. This "CRC code" is sent along with the data. On the other end of the interface, the recipient of the data does the same CRC calculation and compares its result to the code the sender delivered. If there is a mismatch, this means data was corrupted somehow and the block of data is resent. (CRC is similar in concept and operation to the way error checking is done on the system memory.) If errors occur frequently, the system may determine that there are hardware issues and thus drop down to a slower Ultra DMA mode, or even disable Ultra DMA operation.


The first implementation of Ultra DMA was specified in the ATA/ATAPI-4 standard and included three Ultra DMA modes, providing up to 33 MB/s of throughput. Several newer, faster Ultra DMA modes were added in subsequent years. This table shows all of the current Ultra DMA modes, along with their cycle times and maximum transfer rates:


Ultra DMA
Mode

Cycle Time (nanoseconds)

Maximum Transfer Rate (MB/s)

Defining Standard

Mode 0

240

16.7

ATA/ATAPI-4

Mode 1

160

25.0

ATA/ATAPI-4

Mode 2

120

33.3

ATA/ATAPI-4

Mode 3

90

44.4

ATA/ATAPI-5

Mode 4

60

66.7

ATA/ATAPI-5

Mode 5

40

100.0

ATA/ATAPI-6

Mode 6
30
133.0
ATA/ATAPI-7
Mode 7?
---
---
ATA/ATAPI-8


Note: The ATA/ATAPI-7 documentation has been split into three volumes: one for the hard disk commands, one for the traditional parallel ATA interface and one for the SATA-1 interface. Information on obtaining documentation for these standards may be found at http://www.t13.org



The cycle time shows the speed of the interface clock; the clock's frequency is the reciprocal of this number. The maximum transfer rate is four times the reciprocal of the cycle time -- double transition clocking means each cycle has two transfers, and each transfer moves two bytes (16 bits). Only modes 2, 4 and 5 have ever been used in drives, probably because they are the best out of each published standard. Ultra DMA mode 6 is the latest, and is implemented in all currently-shipping drives.

SATA 1 drives are able to use UDMA 150. In other words, they can theoretically push data at an amazing speed of 1.5 Gb/s. SATA 2 drives can theoretically push data at 3.0 Gb/s. However, the SATA calculations for throughput time are a bit different, and will not be covered in this article. The increased speeds are mainly due to the ability to use faster clock speeds due to the simpler cabling design. ATA/ATAPI-8 hasn't been finalized as of July 2006.


Note: In common parlance, drives that use Ultra DMA are often called "Ultra ATA/xx" where "xx" is the speed of the interface. So, few people really talk about current drives being "Ultra DMA mode 5", they say they are "Ultra ATA/100".


Double transition clocking is what allows Ultra DMA mode 2 to have a maximum transfer rate of 33.3 MB/s despite having a clock cycle time identical to "regular DMA" multiword mode 2, which has half that maximum. Now, you may be asking yourself: if they had to go to double transition clocking to get to to 33.3 MB/s, how did they get to 66 MB/s, and then 100 MB/s? Well, they did in fact speed up the interface after all. :^) But the use of double transition clocking let them do it while staying at half the speed they would have needed. Without double transition clocking, Ultra DMA mode 5 would have required a cycle time of 20 nanoseconds instead of 40, making implementation much more difficult. SATA drives use a much faster clock speed, so double transition clocking doesn't apply to them. (or at least it doesn't with SATA 1 drives. I currently don't have enough information about the SATA 2 drives)


Even with the advantage of double transition clocking, going above 33 MB/s finally exceeded the capabilities of the old 40-conductor standard IDE cable. To use Ultra DMA modes over 2, a special, 80-conductor IDE cable is required. This cable uses the same 40 pins as the old cables, but adds 40 ground lines between the original 40 signals to separate those lines from each other and prevent interference and data corruption. (The 80-conductor cable was actually specified in ATA/ATAPI-4 along with the first Ultra DMA modes, but it was "optional" for modes 0, 1 and 2.)


Today, all modern systems that use IDE/ATA drives should be using one of the Ultra DMA modes. There are several specific requirements for running Ultra DMA:



  1. Hard Disk Support: The hard disk itself must support Ultra DMA. In addition, the appropriate Ultra DMA mode must be enabled on the drive.

  2. Controller Support: A controller capable of Ultra DMA transfers must be used. This can be either the interface controller built into the motherboard, or an add-in IDE/ATA interface card.

  3. Operating System Support: The BIOS and/or operating system must support Ultra DMA transfers, and the hard disk must be set to operate in Ultra DMA in the operating system.

  4. 80-Conductor Cable: For Ultra DMA modes over 2, an 80-conductor cable must be used. If an 80-conductor cable is not detected by the system, 66 MB/s or 100 MB/s operation will be disabled.

On new systems there are few issues with running Ultra DMA, because the hardware is all new and designed to run in Ultra DMA mode. With older systems, things are a bit more complex. In theory, new drives should be backwards compatible with older controllers, and putting an Ultra DMA drive on an older PC should cause it to automatically run in a slower mode, such as PIO mode 4. Unfortunately, certain motherboards don't function well when an Ultra DMA drive is connected, and this may result in lockups or errors. A BIOS upgrade from the motherboard manufacturer is a good idea, if you are able to do this. Otherwise, you may need to use a special Ultra DMA software utility (available from the drive manufacturer) to tell the hard disk not to try to run in Ultra DMA mode. The same utility can be used to enable Ultra DMA mode on a drive that is set not to use it. You should use the utility specific to whatever make of drive you have.



-=databat=-

No comments: