Intel’s documentation
• You can find out what any of the Intel x86
instructions does by consulting the official
software developer’s manual.
• Our course-webpage has a link to this site
that you can just click (under ‘Resources’)
• The instruction-set reference is two parts:
– Volume 2A: for opcodes A through M
– Volume 2B: for opcodes N through Z
Example: ‘cmpxchg’
• Operation of the ‘cmpxchg’ instruction is
described (on 3 pages) in Volume 2A
• There’s an English-sentence description,
and also a description in ‘pseudo-code’
• You probably do not want to print out this
complete volume (.pdf) – over 700 pages!
• (You could order a printed copy from Intel)
22 trang |
Chia sẻ: candy98 | Lượt xem: 990 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Advanced Systems Programming - Lesson 9: Intel’s ‘cmpxchg’ instruction, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Intel’s ‘cmpxchg’ instruction
How does the Linux kernel’s
‘cmos_lock’ mechanism work?
Review of the i386 registers
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
General Registers (32-bits)
CS
DS
ES
FS
GS
SS
Segment Registers (16-bits)
EIP
EFLAGS
Program Control and Status Registers (32 bits)
The x86 ‘system’ registers
CR0
CR1
CR2
CR3
CR4
CR5
CR6
CR7
Control Registers (32-bits)
means ‘unimplemented’
DR0
DR1
DR2
DR3
DR4
Debug Registers (32-bits)
DR5
DR6
DR7
LDTR
TR
GDTR
IDTR
(16-bits)
(48-bits)
How often is ‘cmpxchg’ used?
$ cat vmlinux.asm | grep cmpxchg
c01046de: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c01055d9: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c010b895: f0 0f b1 11 lock cmpxchg %edx,(%ecx)
c010b949: f0 0f b1 0b lock cmpxchg %ecx,(%ebx)
c0129a9f: f0 0f b1 0b lock cmpxchg %ecx,(%ebx)
c0129acf: f0 0f b1 0b lock cmpxchg %ecx,(%ebx)
c012d377: f0 0f b1 0e lock cmpxchg %ecx,(%esi)
c012d41a: f0 0f b1 0e lock cmpxchg %ecx,(%esi)
c012d968: f0 0f b1 16 lock cmpxchg %edx,(%esi)
c012e568: f0 0f b1 2e lock cmpxchg %ebp,(%esi)
c012e57a: f0 0f b1 2e lock cmpxchg %ebp,(%esi)
c012e58a: f0 0f b1 2e lock cmpxchg %ebp,(%esi)
c012e83f: f0 0f b1 13 lock cmpxchg %edx,(%ebx)
c012e931: f0 0f b1 0a lock cmpxchg %ecx,(%edx)
c012ea94: f0 0f b1 11 lock cmpxchg %edx,(%ecx)
c012ecf4: f0 0f b1 13 lock cmpxchg %edx,(%ebx)
c012f08e: f0 0f b1 4b 18 lock cmpxchg %ecx,0x18(%ebx)
c012f163: f0 0f b1 11 lock cmpxchg %edx,(%ecx)
c013cb60: f0 0f b1 0e lock cmpxchg %ecx,(%esi)
c0148b3c: f0 0f b1 29 lock cmpxchg %ebp,(%ecx)
c0150d0f: f0 0f b1 3b lock cmpxchg %edi,(%ebx)
c0150d87: f0 0f b1 31 lock cmpxchg %esi,(%ecx)
c0199c5e: f0 0f b1 0b lock cmpxchg %ecx,(%ebx)
c024b06f: f0 0f b1 0b lock cmpxchg %ecx,(%ebx)
c024b2fe: f0 0f b1 51 18 lock cmpxchg %edx,0x18(%ecx)
c024b321: f0 0f b1 51 18 lock cmpxchg %edx,0x18(%ecx)
c024b34b: f0 0f b1 4b 18 lock cmpxchg %ecx,0x18(%ebx)
c024b960: f0 0f b1 53 18 lock cmpxchg %edx,0x18(%ebx)
Here’s the occurrence
that we studied in the
‘rtc_cmos_read()’
kernel-function
plus 28 other times!
Intel’s documentation
• You can find out what any of the Intel x86
instructions does by consulting the official
software developer’s manual, online at:
• Our course-webpage has a link to this site
that you can just click (under ‘Resources’)
• The instruction-set reference is two parts:
– Volume 2A: for opcodes A through M
– Volume 2B: for opcodes N through Z
Example: ‘cmpxchg’
• Operation of the ‘cmpxchg’ instruction is
described (on 3 pages) in Volume 2A
• There’s an English-sentence description,
and also a description in ‘pseudo-code’
• You probably do not want to print out this
complete volume (.pdf) – over 700 pages!
• (You could order a printed copy from Intel)
Instruction format
• Intel’s assembly language syntax differs
from the GNU/Linux syntax (known as
‘AT&T syntax’ with roots in UNIX history)
• When AT&T syntax is used, the ‘cmpxchg’
instruction has this layout:
[lock] cmpxchg reg, reg/mem
optional ‘prefix’
(used for SMP)
mnemonic
opcode
source
operand
destination
operand
‘effects’ and ‘affects’
• According to Intel’s manual, the ‘cmpxchg’
instruction also uses two ‘implicit’ operands
(i.e., operands not mentioned in the instruction)
– The CPU’s accumulator register
– The CPU’s EFLAGS register
• The accumulator-register (EAX) is both a
source-operand and a destination-operand
• The six status-bits in the EFLAGS register will
get modified, as a ‘side-effect’ this instruction
‘cmpxchg’ description
• This instruction compares the accumulator
with the destination-operand (so the ZF-bit
in EFLAGS gets assigned accordingly)
• Then:
– If (accumulator == destination)
{ ZF 1; destination source; }
– If (accumulator != destination)
{ ZF 0; accumulator destination; }
An instruction-instance
• In our recent disassembly of Linux’s kernel
function ‘rtc_cmos_read()’, this ‘cmpxchg’
instruction-instance was used:
lock cmpxchg %edx, cmos_lock
prefix opcode source-operand destination-operand
Note: Keep in mind that the accumulator %eax will affect what happens!
So we need to consider this instruction within it’s surrounding context
The complete function
c0105574 :
c0105574: 53 push %ebx
c0105575: 9c pushf
c0105576: 5b pop %ebx
c0105577: fa cli
c0105578: 64 8b 15 08 20 30 c0 mov %fs:0xc0302008,%edx
c010557f: 0f b6 c8 movzbl %al,%ecx
c0105582: 42 inc %edx
c0105583: c1 e2 08 shl $0x8,%edx
c0105586: 09 ca or %ecx,%edx
c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax
c010558d: 85 c0 test %eax,%eax
c010558f: 75 f7 jne c0105588
c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c
c0105598: c0
c0105599: 85 c0 test %eax,%eax
c010559b: 75 eb jne c0105588
c010559d: 88 c8 mov %cl,%al
c010559f: e6 70 out %al,$0x70
c01055a1: e6 80 out %al,$0x80
c01055a3: e4 71 in $0x71,%al
c01055a5: e6 80 out %al,$0x80
c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c
c01055ae: 00 00 00
c01055b1: 53 push %ebx
c01055b2: 9d popf
c01055b3: 0f b6 c0 movzbl %al,%eax
c01055b6: 5b pop %ebx
c01055b7: c3 ret
The ‘preparation’ steps
• The instructions that preceed ‘cmpxchg’
will setup register EDX (source operand)
and register EAX (the x86 ‘accumulator’)
• Several instructions are used to set up a
value in EDX, and result in this layout:
The current processor’s value for
‘per_cpu__cpu_number’
plus 1
EDX:
CMOS
register’s
index
31 8 7 0
this might be zero
but this part is guaranteed to be non-zero!
The ‘cmos_lock’ variable
• This global variable is initialized to zero,
meaning that access to CMOS memory
locations is not currently ‘locked’
• If some CPU stores a non-zero value in
this variable’s memory-location, it means
that access to CMOS memory is ‘locked’
• The kernel needs to insure that only one
CPU at a time can set this ‘lock’
The ‘most likely’ senario
• One of the CPUs wishes to access CMOS
memory – so it needs to test ‘cmos_lock’
to be sure that access is now ‘unlocked’
(i.e., cmos_lock == 0 is true)
• The CPU copies the ‘cmos_lock’ variable
into the EAX, where it can then be tested
using the ‘test %eax, %eax’ instruction
• A conditional-jump follows the test
The ‘busy-wait’ loop
# Here is a ‘busy-wait’ loop, used to wait for the CMOS access to be ‘unlocked’
spin: mov cmos_lock, %eax # copy lock-variable to accumulator
test %eax, %eax # was CMOS access ‘unlocked’?
jnz spin # if it wasn’t, then check it again
# A CPU will fall through to here if ‘unlocked’ access was detected,
# and that CPU will now attempt to set the ‘lock’ – in other words, it
# will try to assign a non-zero value to the ‘cmos_lock’ variable.
# But there’s a potential ‘race’ here – the ‘cmos_lock’ might have been
# zero when it was copied, but it could have been changed by now
# and that’s why we need to execute ‘lock cmpxchg’ at this point
Busy-waiting will be brief
spin: # see if the lock-variable is clear
mov cmos_lock, %eax
test %eax, %eax
jnz spin
# ok, now we try to grab the lock
lock cmpxchg %edx, cmos_lock
# did another CPU grab it first?
test %eax, %eax
jnz spin
If our CPU wins the ‘race’, the (non-zero) value from source-operand EDX will
have been stored into the (previously zero) ‘cmos_lock’ memory-location, but
the (previously zero) accumulator EAX will not have been modified; hence our
CPU will not jump back, but will fall through and execute the ‘critical section’ of
code (just a few instructions), then will promptly clear the ‘cmos_lock’ variable.
The ‘less likely’ case
spin: # see if the lock-variable is clear
mov cmos_lock, %eax
test %eax, %eax
jnz spin
# ok, now we try to grab the lock
lock cmpxchg %edx, cmos_lock
# did another CPU grab it first?
test %eax, %eax
jnz spin
If our CPU loses the ‘race’, because another CPU changed ‘cmos_lock’ to some
non-zero value after we had fetched our copy of it, then the (now non-zero) value
from the ‘cmos_lock’ destination-operand will have been copied into EAX, and so
the final conditional-jump shown above will take our CPU back into the spin-loop,
where it will resume busy-waiting until the ‘winner’ of the race clears ‘cmos_lock’.
yes
Setup nonzero
value in EDX
EAX cmos_lock EAX
is zero?
no
EAX
is zero?
EAX equals cmos_lock ?
ZF 1
cmos_lock EDX
ZF 0
EAX cmos_lock
yes no
no critical
section
yes
cmos_lock 0
start
finish
flowchart
‘btr’/’bts’ versus ‘cmpxchg’
• In an earlier lesson we used the ‘btr’/’bts’
instructions to achieve ‘mutual exclusion’,
whereas Linux uses ‘cmpxchg’ to do that
• We think ‘btr’/’bts’ is easier to understand,
so why do you think the Linux developers
would prefer to use ‘cmpxchg’ instead?
In-class exercise #1
• Was it really necessary to insert a second
‘test %eax, %eax’ following ‘cmpxchg’?
• Can you design a simple LKM that would
verify your answer to that question?
• The Intel documentation does not state
precisely how other EFLAGS status-bits
(besides ZF) are affected by ‘cmpxchg’,
only that they reflect the comparison of
‘accumulator’ and ‘destination’ operands
• Usually the CPU implements comparison-
of-operands by performing a subtraction
EFLAGS
C
F
1
P
F
0
A
F
0
Z
F
S
F
T
F
I
F
D
F
O
F
IOPL
N
T
R
F
V
M
A
C
31 11 10 9 8 7 6 5 4 3 2 1 0
In-class exercise #2
• Can you decide what Intel means by “the
comparison operation”, by writing suitable
code that examines the effect on EFLAGS
of ‘cmpxchg opnd1, opnd2’ and these two
plausable alternatives:
cmp opnd1, opnd2
cmp opnd2, opnd1