Principles of Database Management Systems - Notes 2: Hardware

 Hardware: disks  Access times  Optimizations  Other topics  Storage costs  Using secondary storage  Disk failures

pdf44 trang | Chia sẻ: candy98 | Lượt xem: 491 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Principles of Database Management Systems - Notes 2: Hardware, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
1Principles of Database Management Systems Notes 2: Hardware Based on lecture notes by Hector Garcia-Molina 2Outline  Hardware: disks  Access times  Optimizations  Other topics  Storage costs  Using secondary storage  Disk failures 3Hardware DBMS Data Storage 4P M C Typical Computer Secondary Storage ...... 5Processor Fast, slow, reduced instruction set, with cache, pipelined Speed: 100   20000 MIPS Memory Fast, slow, non-volatile, read-only, Access time: 10-6  10-9 sec. 1 s  1 ns 6Secondary storage Many flavors: - Disk Floppy Hard drive Optical, CD-ROM Arrays - Flash - Tape Reel Cartridge Robots 7Focus on: “Typical Disk” Terms: Platter, Head, Actuator, Cylinder, Track, Sector (physical), Block (logical), Spindle 8Top View 9“Typical” Numbers Diameter: 1 inch  15 inches Cylinders: 100  2000 Surfaces: 1 (CDs)  (Tracks/cyl) 2 (floppies)  30 Sector Size: 512B  50K Capacity: 360 KB (old floppy)  4000 GB 10 Disk Access Time block X in memory ? I want block X 11 Time = Seek Time + Rotational Delay + Transfer Time + Other 12 Seek Time 3 or 5x x 1 N Cylinders Traveled Time 13 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 ji “Typical” S: 10 ms  40 ms 14 Rotational Delay Head here Block I want 15 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM) 16 Transfer Rate: t  “typical” t: 1  80 MB/second  transfer time: block size / t 17 Other Delays  CPU time to issue I/O  Contention for controller  Contention for bus, memory “Typical” value: 0 18  So far: random block access  What about: reading “next” block? Time to get = Block size + Negligible block t - skip gap - switch track - once in a while, next cylinder 19 Rule of Random I/O: Expensive Thumb Sequential I/O: Much less  E.g.: 1 KB block  Random I/O:  20 ms  Sequential I/O:  1 ms 20 Cost for Writing similar to Reading unless we want to verify! need to add (full) rotation + Block size t 21 • To Modify a block? To modify block: (a) Read block (b) Modify in memory (c) Write block [(d) Verify?] 22 Block Address:  Physical device  Cylinder #  Surface #  Sector 23 Complication: Bad Blocks • Messy to handle • May map via software to integer sequence 1 2 . Map Actual block addresses . m 24 Optimizations (in controller or OS)  Disk scheduling algorithms  E.g., elevator algorithm  Track (or larger) buffer  Pre-fetch  Arrays  Mirrored disks  On disk cache 25 Double Buffering Problem: Have a file  Sequence of blocks B1, B2 Have a program  Process B1  Process B2  Process B3 .. . 26 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer ... 27 Say P = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R) 28 Double Buffering Memory: Disk: A B C D GE F 29 Double Buffering Memory: Disk: A B C D GE F A 30 Double Buffering Memory: Disk: A B C D GE F B done process A 31 Double Buffering Memory: Disk: A B C D GE F C process B done 32 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks  Double buffering time = R + nP  Single buffering time = n(R+P) 33 Disk Arrays  RAIDs (various flavors)  Block Striping  Mirrored logically one disk 34 On Disk Cache P M C ... ... cache cache 35 Block Size Selection?  Big block  Amortize I/O cost  Big block  Read in more useless stuff! and takes longer to read  As memory prices drop, blocks get bigger ... Unfortunately... 36 Storage Cost 10-9 10-6 10-3 10-0 103 access time (sec) 1015 1013 1011 109 107 105 103 cache electronic main electronic secondary magnetic optical disksonline tape nearline tape & optical disks offline tape ty p ic a l ca p a ci ty ( b yt e s) from Gray & Reuter 37 Storage Cost 10-9 10-6 10-3 10-0 103 access time (sec) 104 102 100 10-2 10-4 cache electronic mainelectronic secondary magnetic optical disks online tape nearline tape & optical disksoffline tape d o lla rs /M B from Gray & Reuter 38 Using secondary storage effectively  Example: Sorting data on disk  Conclusion:  I/O costs dominate Design algorithms to reduce I/O  Also: How big should blocks be? 39 Disk Failures  Partial  Total  Intermittent  Permanent 40 Coping with Disk Failures  Detection e.g. Checksum  Correction  Redundancy 41 At what level do we cope?  Single disk e.g., error correcting codes (ECC)  Disk array Logical Physical 42 Operating System e.g., Stable Storage Logical Block Copy A Copy B 43 Database System  e.g., Log Current DB Last week’s DB 44 Summary  Secondary storage, mainly disks  I/O times  I/Os should be avoided, especially random ones.. S