Our recent ‘race’ example
Our ‘cmosram.c’ device-driver included a
‘race condition’ in its ‘read()’ and ‘write()’
functions, since accessing any CMOS
memory-location is a two-step operation,
and thus is a ‘critical section’ in our code:
outb( reg_id, 0x70 );
datum = inb( 0x71 );
Once the first step in this sequence is
taken, the second step needs to follow
No interventions!
• To guarantee the integrity of each access
to CMOS memory, we must prohibit every
possibility that another control-thread may
intervene and access that same i/o-port
• The main ways in which an intervention by
another ‘thread’ might happen are:
– The current CPU could get ‘interrupted’; or
– Another CPU could access the same i/o-port
29 trang |
Chia sẻ: candy98 | Lượt xem: 963 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Advanced Systems Programming - Lesson 8: A race-cure case study, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
A race-cure case study
A look at how some standard
software tools can illuminate what
is happening inside Linux
Our recent ‘race’ example
• Our ‘cmosram.c’ device-driver included a
‘race condition’ in its ‘read()’ and ‘write()’
functions, since accessing any CMOS
memory-location is a two-step operation,
and thus is a ‘critical section’ in our code:
outb( reg_id, 0x70 );
datum = inb( 0x71 );
• Once the first step in this sequence is
taken, the second step needs to follow
No interventions!
• To guarantee the integrity of each access
to CMOS memory, we must prohibit every
possibility that another control-thread may
intervene and access that same i/o-port
• The main ways in which an intervention by
another ‘thread’ might happen are:
– The current CPU could get ‘interrupted’; or
– Another CPU could access the same i/o-port
Linux’s solution
• Linux provides a function that an LKM can
call which is designed to insure ‘exclusive
access’ to a CMOS memory-location:
datum = rtc_cmos_read( reg_id );
• By using this function, a programmer does
not have to expend time and mental effort
analyzing the race-condition and devising
a suitable ‘cure’ for it
But how does it work?
• As computer science students, we are not
satisfied with just using convenient ‘black-
box’ solutions which we don’t understand
• Such purported ‘solutions’ may not always
accomplish everything that they claim – if
they perform correctly today, they still may
fail in some way in the future (if hardware
changes); we don’t want to be helpless!
Is ‘open source’ enough?
• In theory we could try to track down the
actual behavior of the ‘rtc_cmos_read()’
function, by reading Linux’s source-code
• But is that really a practical approach?
• In some cases the answer might be ‘yes’,
but in other situations it might be ‘no’!
• Life is short, and the kernel source-files
are very numerous – with many layers
‘LXR’ can help
• The Linux Cross-Reference tool offers a
way to automate searching kernel source
• This tool is online (see our website’s link
under ‘Resources’) and it is hosted on a
server in Norway:
• Here you just click on “Browse the Code”
From:
unsigned char rtc_cmos_read(unsigned char addr)
{
unsigned char val;
lock_cmos_prefix( addr );
outb_p( addr, RTC_PORT(0) );
val = inb_p( RTC_PORT(1) ;
lock_cmos_suffix( addr );
return val;
}
EXPORT_SYMBOL( rtc_cmos_read );
Another approach
• There is an alternative to searching kernel
source files -- which may well be faster
• We can use some standard command-line
tools, including ‘objdump’ and ‘grep’
• In this approach, we look at the compiled
kernel’s object-file, named ‘vmlinux’, found
normally in the ‘/usr/src/linux’ subdirectory
• Using ‘objdump’ that file can be parsed!
‘objdump’ can disassemble
• Change the current working directory:
$ cd /usr/src/linux
• Then, to disassemble the ‘vmlinux’ kernel
file we use can this command:
$ objdump -d vmlinux
• But the amount of output will be huge, so
it’s hard to find the part we’re interested in
‘grep’ can do filtering
• If we want to see the ‘rtc_cmos_read’ code
we could use ‘grep’ to eliminate irrelevant
parts of the disassembly-output:
$ objdump –d vmlinux | grep rtc_cmos_read
• But we still see too many lines of output
(because the ‘rtc_cmos_read()’ function
gets called at many places in the kernel)
‘System.map’
• We can use a special textfile, located in
the ‘/boot’ directory, which tells us where
each ‘exported’ kernel-symbol will reside
at run-time in the virtual address-space
• You can use ‘cat’ to look at this textfile:
$ cat /boot/System.map
• And you can use ‘grep’ to find only the
symbol you care about:
$ cat /boot/System.map | grep rtc_cmos_read
Example on our machines
$ cat /boot/System.map-2.6.22.5cslabs | grep rtc_cmos_read
c0105574 T rtc_cmos_read
c029b8a8 r __ksymtab_rtc_cmos_read
c02a0bff r __kstrtab_rtc_cmos_read
Note that the usual ‘symbolic link’ is missing from the ‘/boot’ directory
on our class and lab machines -- so you have to type a longer name
With superuser privileges this could be fixed using the ‘ln’ command:
root# ln System.map-2.6.22.5cslabs System.map
Now we know where to look
• From the ‘System.map’ we learn where in
the kernel our ‘rtc_cmos_read()’ function
will reside
• We can ‘extract’ that function’s code, for
study purpose, using these steps:
– Save the complete ‘vmlinux’ disassembly
– Use ‘grep’ to find its starting-address
– Use ‘vi’ to delete earlier and later instructions
• Step 1: saving the ‘vmlinux’ disassembly
$ objdump –d /usr/src/linux/vmlinux > ~/vmlinux.asm
• Step 2: finding our function’s entry-point
$ cat ~/vmlinux.asm | grep -n c0105574
What we discover
Find the line that shows this virtual address (with colon)
$ cat vmlinux.asm | grep -n c0105574:
6812:c0105574: 53 push %ebx
and tell us which line-number it’s on
OK, here’s that line
and this is it’s line-number
Use a text-editor
• Remove all the lines in your ‘vmlinux.asm’
textfile whose line-numbers precede 6812
• Scroll down, to find where your function
ends (i.e., find its return-instruction ‘ret’):
c01055b7: c3 ret
• Delete all the lines that follow the ‘return’
The complete function
c0105574 :
c0105574: 53 push %ebx
c0105575: 9c pushf
c0105576: 5b pop %ebx
c0105577: fa cli
c0105578: 64 8b 15 08 20 30 c0 mov %fs:0xc0302008,%edx
c010557f: 0f b6 c8 movzbl %al,%ecx
c0105582: 42 inc %edx
c0105583: c1 e2 08 shl $0x8,%edx
c0105586: 09 ca or %ecx,%edx
c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax
c010558d: 85 c0 test %eax,%eax
c010558f: 75 f7 jne c0105588
c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c0105598: c0
c0105599: 85 c0 test %eax,%eax
c010559b: 75 eb jne c0105588
c010559d: 88 c8 mov %cl,%al
c010559f: e6 70 out %al,$0x70
c01055a1: e6 80 out %al,$0x80
c01055a3: e4 71 in $0x71,%al
c01055a5: e6 80 out %al,$0x80
c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c
c01055ae: 00 00 00
c01055b1: 53 push %ebx
c01055b2: 9d popf
c01055b3: 0f b6 c0 movzbl %al,%eax
c01055b6: 5b pop %ebx
c01055b7: c3 ret
Some ‘magic’ numbers
• There are some hexadecimal constants in this
code-disassembly which we probably will not
understand without more research
– This memory-address: 0xc030993c
– This i/o-port address: 0x80
– This memory-address: %fs:0xc0302008
• There’s also a jump-target, but we do have
some help in deciphering what it means:
jne c0105588
The ‘cmpxchg’ instruction
• The ‘cmpxchg’ instruction performs these CPU
actions in a single operation:
cmpxchg source, destination
– The destination-operand is compared with the
accumulator-register’s value, and the eflags-bits are
adjusted to reflect this comparison’s result
– If ZF is set, the value of the source-operand is copied
to the destination-operand; otherwise, the destination
operand is copied to the accumulator register
• A ‘lock’ prefix stops another CPUs’ bus-access
‘spinlock’
c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax
c010558d: 85 c0 test %eax,%eax
c010558f: 75 f7 jne c0105588
c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c0105598: c0
c0105599: 85 c0 test %eax,%eax
c010559b: 75 eb jne c0105588
Before the code’s ‘critical section’ we have this:
And then after the code’s ‘critical section’ we have this:
c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c
c010559d: 88 c8 mov %cl,%al
c010559f: e6 70 out %al,$0x70
c01055a1: e6 80 out %al,$0x80
c01055a3: e4 71 in $0x71,%al
c01055a5: e6 80 out %al,$0x80
Then we have the function’s ‘critical section’ of code:
I/O-port 0x80
has an ‘undefined’
system function
used for time-delay
The ‘System-map’ again
• The ‘System.map’ shows what the other
mysterious memory-addresses mean:
• We see that memory-address c030993c
has the label ‘cmos_lock’ (supporting our
previous conclusion about a ‘spinlock’);
also we get a ‘clue’ about 0xc0302008
$ cat /boot/System.map-2.6.22.5cslabs | grep c030993c
c030993c B cmos_lock
$ cat /boot/System.map-2.6.22.5cslabs | grep c0302008
c0302008 D per_cpu__cpu_number
What is ‘per_cpu’ data?
• With SMP systems there is often a need
for each CPU to have its own version of
some program-variable’s value
• One example: each CPU needs a unique
identification-number (used in scheduling
tasks for ‘load-balancing’ and respecting
‘processor-affinity’, and keeping track of
which CPU now owns a particular ‘lock’)
• That’s what ‘per_cpu__cpu_number’ is
Role of segmentation
• Linux has a clever way of allowing CPUS
to access their ‘per_cpu’ variables using
the same name for different locations
• This can be arranged by exploiting the
CPU’s memory-segmentation architecture
• The FS segment-register is used by the
kernel to reference identically-named, but
differently positioned, storage-locations
Each CPU has its own GDT
• The Operating System sets up a Global
Descriptor Table for each CPU; it’s an
array of memory-segment descriptors:
segment
access
rights
segment-base[ 15..0 ] segment-limit[ 15..0 ]
segment-
base[ 23..16 ]
segment-
base[ 31..24 ]
segment-
limit[ 19..16 ]G D
63 32
31 0
‘segment-base’ tells where the memory-area begins, ‘segment-limit’
tells how far the memory-area extends, and ‘access rights’ specifies
how the memory-area will be used by the CPU (e.g., user or kernel)
In-class exercise #1
• Install our ‘dram.c’ device-driver, so you
can run our ‘showgdt.cpp’ application
• You will see a CPU’s memory-descriptors
(displayed as quadwords in hex format)
• You will probably see a slightly different
table when you run ‘showgdt’ again – if
Linux schedules it on a different CPU
What’s in register FS?
• You can use our ‘newinfo.cpp’ utility to
quickly create an LKM that displays the
values in the CPU’s segment-registers:
// using ‘global variables’ simplifies the inline assembly language
short _cs, _ds, _es, _fs, _gs, _ss; // global variables
int my_get_info( )
{
int len;
asm(“ mov %cs, _cs \n mov %ds, _ds “);
len = sprintf( buf, “CS=%04X DS=%04X \n”, _cs, _ds );
return len;
}
In-class exercise #2
• Use the value in the FS segment-register
to look up that segment’s ‘base-address’
(different base-address on different CPU)
• Convert the ‘virtual’ base-address to its
corresponding ‘physical’ base-address
• Use our ‘fileview’ utility to look at what’s
stored in physical memory at those spots
• Check the location: %fs:0xc0302008
‘virtual-to-physical’
• If a virtual address is not in the ‘high’ area
(i.e., if it’s below 0xF8000000), then it is
easy to calculate it’s physical address by
doing a simple subtraction
user
space
(3GB)
kernel
space
(1GB)
virtual address-space
4GB
0xC0000000
0xF8000000
Subtract 0xC0000000 from virtual address
to get physical address – but NOT in HMA
High Memory Area