Advanced Systems Programming - Lesson 9: Intel’s ‘cmpxchg’ instruction

Intel’s documentation • You can find out what any of the Intel x86 instructions does by consulting the official software developer’s manual. • Our course-webpage has a link to this site that you can just click (under ‘Resources’) • The instruction-set reference is two parts: – Volume 2A: for opcodes A through M – Volume 2B: for opcodes N through Z Example: ‘cmpxchg’ • Operation of the ‘cmpxchg’ instruction is described (on 3 pages) in Volume 2A • There’s an English-sentence description, and also a description in ‘pseudo-code’ • You probably do not want to print out this complete volume (.pdf) – over 700 pages! • (You could order a printed copy from Intel)

22 trang | Chia sẻ: candy98 | Lượt xem: 1313 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Advanced Systems Programming - Lesson 9: Intel’s ‘cmpxchg’ instruction, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Intel’s ‘cmpxchg’ instruction How does the Linux kernel’s ‘cmos_lock’ mechanism work? Review of the i386 registers EAX EBX ECX EDX ESI EDI EBP ESP General Registers (32-bits) CS DS ES FS GS SS Segment Registers (16-bits) EIP EFLAGS Program Control and Status Registers (32 bits) The x86 ‘system’ registers CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7 Control Registers (32-bits) means ‘unimplemented’ DR0 DR1 DR2 DR3 DR4 Debug Registers (32-bits) DR5 DR6 DR7 LDTR TR GDTR IDTR (16-bits) (48-bits) How often is ‘cmpxchg’ used? $ cat vmlinux.asm | grep cmpxchg c01046de: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c01055d9: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c010b895: f0 0f b1 11 lock cmpxchg %edx,(%ecx) c010b949: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) c0129a9f: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) c0129acf: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) c012d377: f0 0f b1 0e lock cmpxchg %ecx,(%esi) c012d41a: f0 0f b1 0e lock cmpxchg %ecx,(%esi) c012d968: f0 0f b1 16 lock cmpxchg %edx,(%esi) c012e568: f0 0f b1 2e lock cmpxchg %ebp,(%esi) c012e57a: f0 0f b1 2e lock cmpxchg %ebp,(%esi) c012e58a: f0 0f b1 2e lock cmpxchg %ebp,(%esi) c012e83f: f0 0f b1 13 lock cmpxchg %edx,(%ebx) c012e931: f0 0f b1 0a lock cmpxchg %ecx,(%edx) c012ea94: f0 0f b1 11 lock cmpxchg %edx,(%ecx) c012ecf4: f0 0f b1 13 lock cmpxchg %edx,(%ebx) c012f08e: f0 0f b1 4b 18 lock cmpxchg %ecx,0x18(%ebx) c012f163: f0 0f b1 11 lock cmpxchg %edx,(%ecx) c013cb60: f0 0f b1 0e lock cmpxchg %ecx,(%esi) c0148b3c: f0 0f b1 29 lock cmpxchg %ebp,(%ecx) c0150d0f: f0 0f b1 3b lock cmpxchg %edi,(%ebx) c0150d87: f0 0f b1 31 lock cmpxchg %esi,(%ecx) c0199c5e: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) c024b06f: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) c024b2fe: f0 0f b1 51 18 lock cmpxchg %edx,0x18(%ecx) c024b321: f0 0f b1 51 18 lock cmpxchg %edx,0x18(%ecx) c024b34b: f0 0f b1 4b 18 lock cmpxchg %ecx,0x18(%ebx) c024b960: f0 0f b1 53 18 lock cmpxchg %edx,0x18(%ebx) Here’s the occurrence that we studied in the ‘rtc_cmos_read()’ kernel-function plus 28 other times! Intel’s documentation • You can find out what any of the Intel x86 instructions does by consulting the official software developer’s manual, online at: • Our course-webpage has a link to this site that you can just click (under ‘Resources’) • The instruction-set reference is two parts: – Volume 2A: for opcodes A through M – Volume 2B: for opcodes N through Z Example: ‘cmpxchg’ • Operation of the ‘cmpxchg’ instruction is described (on 3 pages) in Volume 2A • There’s an English-sentence description, and also a description in ‘pseudo-code’ • You probably do not want to print out this complete volume (.pdf) – over 700 pages! • (You could order a printed copy from Intel) Instruction format • Intel’s assembly language syntax differs from the GNU/Linux syntax (known as ‘AT&T syntax’ with roots in UNIX history) • When AT&T syntax is used, the ‘cmpxchg’ instruction has this layout: [lock] cmpxchg reg, reg/mem optional ‘prefix’ (used for SMP) mnemonic opcode source operand destination operand ‘effects’ and ‘affects’ • According to Intel’s manual, the ‘cmpxchg’ instruction also uses two ‘implicit’ operands (i.e., operands not mentioned in the instruction) – The CPU’s accumulator register – The CPU’s EFLAGS register • The accumulator-register (EAX) is both a source-operand and a destination-operand • The six status-bits in the EFLAGS register will get modified, as a ‘side-effect’ this instruction ‘cmpxchg’ description • This instruction compares the accumulator with the destination-operand (so the ZF-bit in EFLAGS gets assigned accordingly) • Then: – If (accumulator == destination) { ZF  1; destination  source; } – If (accumulator != destination) { ZF  0; accumulator  destination; } An instruction-instance • In our recent disassembly of Linux’s kernel function ‘rtc_cmos_read()’, this ‘cmpxchg’ instruction-instance was used: lock cmpxchg %edx, cmos_lock prefix opcode source-operand destination-operand Note: Keep in mind that the accumulator %eax will affect what happens! So we need to consider this instruction within it’s surrounding context The complete function c0105574 : c0105574: 53 push %ebx c0105575: 9c pushf c0105576: 5b pop %ebx c0105577: fa cli c0105578: 64 8b 15 08 20 30 c0 mov %fs:0xc0302008,%edx c010557f: 0f b6 c8 movzbl %al,%ecx c0105582: 42 inc %edx c0105583: c1 e2 08 shl $0x8,%edx c0105586: 09 ca or %ecx,%edx c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax c010558d: 85 c0 test %eax,%eax c010558f: 75 f7 jne c0105588 c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c0105598: c0 c0105599: 85 c0 test %eax,%eax c010559b: 75 eb jne c0105588 c010559d: 88 c8 mov %cl,%al c010559f: e6 70 out %al,$0x70 c01055a1: e6 80 out %al,$0x80 c01055a3: e4 71 in $0x71,%al c01055a5: e6 80 out %al,$0x80 c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c c01055ae: 00 00 00 c01055b1: 53 push %ebx c01055b2: 9d popf c01055b3: 0f b6 c0 movzbl %al,%eax c01055b6: 5b pop %ebx c01055b7: c3 ret The ‘preparation’ steps • The instructions that preceed ‘cmpxchg’ will setup register EDX (source operand) and register EAX (the x86 ‘accumulator’) • Several instructions are used to set up a value in EDX, and result in this layout: The current processor’s value for ‘per_cpu__cpu_number’ plus 1 EDX: CMOS register’s index 31 8 7 0 this might be zero but this part is guaranteed to be non-zero! The ‘cmos_lock’ variable • This global variable is initialized to zero, meaning that access to CMOS memory locations is not currently ‘locked’ • If some CPU stores a non-zero value in this variable’s memory-location, it means that access to CMOS memory is ‘locked’ • The kernel needs to insure that only one CPU at a time can set this ‘lock’ The ‘most likely’ senario • One of the CPUs wishes to access CMOS memory – so it needs to test ‘cmos_lock’ to be sure that access is now ‘unlocked’ (i.e., cmos_lock == 0 is true) • The CPU copies the ‘cmos_lock’ variable into the EAX, where it can then be tested using the ‘test %eax, %eax’ instruction • A conditional-jump follows the test The ‘busy-wait’ loop # Here is a ‘busy-wait’ loop, used to wait for the CMOS access to be ‘unlocked’ spin: mov cmos_lock, %eax # copy lock-variable to accumulator test %eax, %eax # was CMOS access ‘unlocked’? jnz spin # if it wasn’t, then check it again # A CPU will fall through to here if ‘unlocked’ access was detected, # and that CPU will now attempt to set the ‘lock’ – in other words, it # will try to assign a non-zero value to the ‘cmos_lock’ variable. # But there’s a potential ‘race’ here – the ‘cmos_lock’ might have been # zero when it was copied, but it could have been changed by now # and that’s why we need to execute ‘lock cmpxchg’ at this point Busy-waiting will be brief spin: # see if the lock-variable is clear mov cmos_lock, %eax test %eax, %eax jnz spin # ok, now we try to grab the lock lock cmpxchg %edx, cmos_lock # did another CPU grab it first? test %eax, %eax jnz spin If our CPU wins the ‘race’, the (non-zero) value from source-operand EDX will have been stored into the (previously zero) ‘cmos_lock’ memory-location, but the (previously zero) accumulator EAX will not have been modified; hence our CPU will not jump back, but will fall through and execute the ‘critical section’ of code (just a few instructions), then will promptly clear the ‘cmos_lock’ variable. The ‘less likely’ case spin: # see if the lock-variable is clear mov cmos_lock, %eax test %eax, %eax jnz spin # ok, now we try to grab the lock lock cmpxchg %edx, cmos_lock # did another CPU grab it first? test %eax, %eax jnz spin If our CPU loses the ‘race’, because another CPU changed ‘cmos_lock’ to some non-zero value after we had fetched our copy of it, then the (now non-zero) value from the ‘cmos_lock’ destination-operand will have been copied into EAX, and so the final conditional-jump shown above will take our CPU back into the spin-loop, where it will resume busy-waiting until the ‘winner’ of the race clears ‘cmos_lock’. yes Setup nonzero value in EDX EAX  cmos_lock EAX is zero? no EAX is zero? EAX equals cmos_lock ? ZF  1 cmos_lock  EDX ZF  0 EAX  cmos_lock yes no no critical section yes cmos_lock  0 start finish flowchart ‘btr’/’bts’ versus ‘cmpxchg’ • In an earlier lesson we used the ‘btr’/’bts’ instructions to achieve ‘mutual exclusion’, whereas Linux uses ‘cmpxchg’ to do that • We think ‘btr’/’bts’ is easier to understand, so why do you think the Linux developers would prefer to use ‘cmpxchg’ instead? In-class exercise #1 • Was it really necessary to insert a second ‘test %eax, %eax’ following ‘cmpxchg’? • Can you design a simple LKM that would verify your answer to that question? • The Intel documentation does not state precisely how other EFLAGS status-bits (besides ZF) are affected by ‘cmpxchg’, only that they reflect the comparison of ‘accumulator’ and ‘destination’ operands • Usually the CPU implements comparison- of-operands by performing a subtraction EFLAGS C F 1 P F 0 A F 0 Z F S F T F I F D F O F IOPL N T R F V M A C 31 11 10 9 8 7 6 5 4 3 2 1 0 In-class exercise #2 • Can you decide what Intel means by “the comparison operation”, by writing suitable code that examines the effect on EFLAGS of ‘cmpxchg opnd1, opnd2’ and these two plausable alternatives: cmp opnd1, opnd2 cmp opnd2, opnd1