Remenber ‘echo’ and ‘cat’?
• Your device-driver module (named ‘uart.c’
was supposed to allow two programs that
are running on a pair of adjacent PCs to
communicate via a “null-modem” cable
‘keep it simple’
• Let’s try to implement a ‘write()’ routine for
our Intel Pro/1000 ethernet controllers that
will provide the same basic functionality as
we achieved with our serial UART driver
• It should allow us to transmit a message
by using the familiar UNIX ‘cat’ command
to redirect output to a character device-file
• Our device-file will be named ‘/dev/nic’
23 trang |
Chia sẻ: candy98 | Lượt xem: 914 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Advanced Systems Programming - Lesson 24: Our ‘xmit1000.c’ driver, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Our ‘xmit1000.c’ driver
Implementing a ‘packet-transmit’
capability with the Intel 82573L
network interface controller
Remenber ‘echo’ and ‘cat’?
• Your device-driver module (named ‘uart.c’)
was supposed to allow two programs that
are running on a pair of adjacent PCs to
communicate via a “null-modem” cable
$ echo Hello > /dev/uart
$ _
$ cat /dev/uart
Hello _
ReceivingTransmitting
‘keep it simple’
• Let’s try to implement a ‘write()’ routine for
our Intel Pro/1000 ethernet controllers that
will provide the same basic functionality as
we achieved with our serial UART driver
• It should allow us to transmit a message
by using the familiar UNIX ‘cat’ command
to redirect output to a character device-file
• Our device-file will be named ‘/dev/nic’
This function will program the actual data-transfer
Driver’s components
write
my_fops
my_write()
module_init() module_exit()
This function will allow us to inspect the transmit-descriptors
This function will detect and configure
the hardware, define page-mappings,
allocate and initialize the descriptors,
start the ‘transmit’ engine, create the
pseudo-file and register ‘my_fops’
This function will do needed ‘cleanup’
when it’s time to unload our driver –
turn off the ‘transmit’ engine, free the
memory, delete page-table entries,
the pseudo-file, and the ‘my_fops’
‘struct’ holds one
function-pointer
my_get_info()
Kzalloc()
• Linux kernels since 2.6.13 offer this convenient
function for allocating pre-zeroed kernel memory
• It has the same syntax as the ‘kmalloc()’ function
(described in our texts), but adds the after-effect
of zeroing out the newly-allocated memory-area
• Thus it does two logically distinct actions (often
coupled anyway) within a single function-call
void *kmem = kmalloc( region_size, GFP_KERNEL );
memset( kmem, 0x00, region_size );
/* can be replaced with */
void *kmem = kzalloc( region_size, GFP_KERNEL );
Single page-frame option
Packet-Buffer (3-KB)
(reused for successive transmissions)
4KB
Page-
Frame
Descriptor-Buffer (1-KB)
(room for up to 256 descriptors)
Our Tx-Descriptor ring
descriptor 0
Our
‘reusable’
transmit-buffer
(1536 bytes)
descriptor 1
descriptor 2
descriptor 3
descriptor 4
descriptor 5
descriptor 6
descriptor 7
Array of 8 transmit-descriptorsone packet-buffer
TAIL HEAD
After writing the data into our packet-buffer, and writing its length to the
the current TAIL descriptor, our driver will advance the TAIL index; the
NIC responds by reading the current HEAD descriptor, fetching its data,
then advancing the HEAD index as it sends our data out over the wire.
‘/proc/xmit1000’
• This pseudo-file can be examined anytime
to find out what values (if any) the NIC has
‘written back’ into the transmit-descriptors
(i.e., the descriptor-status information) and
current values in registers TDH and TDT:
$ cat /proc/xmit1000
Direct Memory Access
• The NIC is able to ‘fetch’ descriptors from
host-system’s memory (and also can read
the data from our packet-buffer) as well as
‘store’ a status-report back into the host’s
memory by temporarily becoming the Bus
Master (taking control of the system-bus
away from the CPU so that it can perform
the ‘fetch’ and ‘store’ operations directly,
without CPU involvement or interference)
Configuration registers
TIPG
TCTL
TDBAL
TDBAH
TDLEN
TDH
TDT
TXDCTL
CTRL
CTRL_EXT
Device Control
Extended Device Control
Transmit Inter-Packet Gap
Transmit Control
Transmit Descriptor-queue Base-Address (LOW)
Transmit Descriptor-queue Base-Address (HIGH)
Transmit Descriptor-queue Length
Transmit Descriptor-queue HEAD
Transmit Descriptor-queue TAIL
Transmit Descriptor-queue Control
The ‘initialization’ sequence
• Detect the network interface controller
• Obtain its i/o-memory address and size
• Remap the i/o-memory into kernel-space
• Allocate memory for buffer and descriptors
• Initialize the array of transmit-descriptors
• Reset the NIC and configure its operations
• Create the ‘/proc/xmit1000’ pseudo-file
• Register our ‘write()’ driver-method
The ‘cleanup’ sequence
• Usually the steps here follow those in the
initialization sequence -- but in backwards
order:
• Unregister the device-driver’s file-operations
• Delete the ‘/proc/xmit1000’ pseudo-file
• Disable the NIC’s ‘transmit’ engine
• Release the allocated kernel-memory
• Unmap the NIC’s i/o-memory region
Our ‘write()’ algorithm
• Get index of the current TAIL descriptor
• Confine the amount of user-data
• Copy user-data into the packet-buffer
• Setup the packet’s Ethernet Header
• Setup packet-length in the TAIL descriptor
• Now hand over this descriptor to the NIC
(by advancing the value in register TDT)
• Tell the kernel how many bytes were sent
Recall Tx-Descriptor Layout
special
0x0
0x4
0x8
0xC
CMD
Buffer-Address high (bits 63..32)
Buffer-Address low (bits 31..0)
31 0
Packet Length (in bytes)CSO
statusCSS
reserved
=0
Buffer-Address = the packet-buffer’s 64-bit address in physical memory
Packet-Length = number of bytes in the data-packet to be transmitted
CMD = Command-field CSO/CSS = Checksum Offset/Start (in bytes)
STA = Status-field
Suggested C syntax
typedef struct {
unsigned long long base_addr;
unsigned short pkt_length;
unsigned char cksum_off;
unsigned char desc_cmd;
unsigned char desc_stat;
unsigned char cksum_org;
unsigned short special;
} TX_DESCRIPTOR;
Transmit IPG (0x0410)
82573L
IPG
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R
=0
IPG After Deferral
(Recommended value = 7)
IPG Part 1
(Recommended value = 8)
IPG Back-To-Back
(Recommended value = 8)
IPG = Inter-Packet Gap
This register controls the Inter-Packet Gap timer for the Ethernet controller.
Note that the recommended TIPG register-value to achieve IEEE 802.3
compliant minimum transfer IPG values in full- and half-duplex operations
would be 00702008 (hexadecimal), equal to (7<<20) | (8<<10) | (8<<0).
Transmit Control (0x0400)
R
=0
R
=0
R
=0
MULR TXCSCMT
UNO
RTX
RTLC R
=0
SW
XOFF
COLD (upper 6-bits)
(COLLISION DISTANCE)
COLD (lower 4-bits)
(COLLISION DISTANCE) 0 ASDV
I
L
O
S
S
L
U
TBI
mode
P
S
P
0 0 R
=0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R
=0
E
N
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
SPEED
CT
(COLLISION THRESHOLD)
EN = Transmit Enable SWXOFF = Software XOFF Transmission
PSP = Pad Short Packets RLTC = Retransmit on Late Collision
CT = Collision Threshold (=0xF) UNORTX = Underrun No Re-Transmit
COLD = Collision Distance (=0x3F) TXCSCMT = TxDescriptor Minimum Threshold
MULR = Multiple Request Support
82573L
Our driver’s elections
int tx_control = 0;
tx_control |= (0<<1); // EN-bit (Enable Transmit Engine)
tx_control |= (1<<3); // PSP-bit (Pad Short Packets)
tx_control |= (15<<4); // CT=15 (Collision Threshold)
tx_control |= (63<<12); // COLD=63 (Collision Distance)
tx_control |= (0<<22); // SWXOFF-bit (Software XOFF Tx)
tx_control |= (1<<24); // RTLC-bit (Re-Transmit on Late Collision)
tx_control |= (0<<25); // UNORTX-bit (Underrun No Re-Transmit)
tx_control |= (0<<26); // TXCSMT=0 (Tx-descriptor Min Threshold)
tx_control |= (0<<28); // MULR-bit (Multiple Request Support)
iowrite32( tx_control, io + E1000_TCTL ); // Transmit Control register
82573L
Here’s a C programming style that ‘documents’ the programmer’s choices.
An ‘e1000.c’ anomaly?
• The official Linux kernel is delivered with a
device-driver supporting Intel’s ‘Pro/1000’
gigabit ethernet controllers (several)
• Often this driver will get loaded by default
during the system’s startup procedures
• But it will interfere with your own driver if
you try to write a substitute for ‘e1000.ko’
• So you will want to remove it with ‘rmmod’
Side-effect of ‘rmmod’
• We’ve observed an unexpected side-effect
of ‘unloading’ the ‘e1000.ko’ device-driver
• The PCI Configuration Space’s command
register gets modified in a way that keeps
the NIC from working with your own driver
• Specifically, the Bus Mastering capability
gets disabled (by clearing bit #2 in the PCI
Configuration Space’s word at address 4)
What to do about it?
• This effect doesn’t arise on our ‘anchor’
cluster machines, but you may encounter
it when you try using our demo elsewhere
• Here’s the simple “fix” to turn Bus Master
capability back on (in your ‘module_init()’)
u16 pci_cmd; // declares a 16-bit variable
pci_read_config_word( devp, 4, &pci_cmd ); // read current word
pci_cmd |= (1<<2); // turn on the Bus Master enabled-bit
pci_write_config_word( devp, 4, pci_cmd ); // write modification
In-class demo
• We demonstrate our ‘xmit1000.c’ driver on
an ‘anchor’ machine, with some help from
a companion-module (named ‘recv1000.c’)
which is soon-to-be discussed in class
$ echo Hello > /dev/nic
$ _
$ cat /dev/nic
Hello _
ReceivingTransmitting
anchor01 anchor05
LAN
In-class exercise
• Open three or more terminal-windows on
your PC’s graphical desktop, and login to
a different ‘anchor’ machine in each one
• Install the ‘xmit1000.ko’ module on one of
the anchor machines, and then install our
‘recv1000.ko’ module on the other stations
• Execute the ‘cat /dev/nic’ command on the
receiver-stations, and then run an ‘echo’
command on the transmitter-station