Efficient Metadata Update Techniques for Storage Systems

Loading...
Thumbnail Image
Authors
Lu, Maohua
Issue Date
1-Aug-10
Type
Dissertation
Language
en_US
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
The simple read/write interface exposed by traditional disk I/O systems isinadequate for low-locality update-intensive workloads because it limits theflexibility of the disk I/O systems in scheduling disk access requests andresults in inefficient use of buffer memory and disk bandwidth. We proposed anovel disk I/O subsystem architecture called Batching mOdifications withSequential Commit (BOSC), which is optimized for workloads characterized byintensive random updates. BOSC improves the sustained disk update throughput byeffectively aggregating disk update operations and sequentially committing themto disk.We demonstrated the benefits of BOSC by adapting it to 3 different storagesystems. The first one is a continuous data protection system called Mariner.Mariner is an iSCSI-based storage system that is designed to providecomprehensive data protection on commodity hardware while offering the sameperformance as those without any such protection. With the help of BOSC inmetadata updating, the throughput of Mariner has less than 10\% degradationcompared to that without metadata updating.Flash-based storage is the second storage system we leveraged BOSC.Because of the physics underlying the flash memory technology and the coarseaddress mapping granularity used in the on-board flash translation layer (FTL),commodity flash disks exhibit poor random write performance. We designed LFSM, aLog-structured Flash Storage Manager, to eliminate the random write performanceproblem of commodity flash disks by employing data logging and BOSC in metadataupdating. LFSM is able to reduce the average write latency of a commodity flashdisk by a factor of more than 6 under standard benchmarks.As a third example, we applied BOSC to a scalable data de-duplicationsystem based on the incremental backups. Each input block is de-duplicated bycomparing its fingerprint, a collision-free hash value, with existingfingerprints. A range-based block group, called segment, is the basic unit topreserve data locality for incremental backups. We propose four novel techniquesto improve the de-duplication throughput with minimal impact on data de-duplicationratio (DDR). BOSC is employed to eliminate the performance bottleneck due tocommitting segment updates to the disk.
Description
Citation
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
License
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN