A Context Aware Block Layer: The Case for Block Layer Deduplication

Journal Title

Loading...
Thumbnail Image

Issue Date

1-May-12

Authors

Mudrankit, Amar

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Keywords

Abstract

The context of data is important for optimal performance of data management systems like deduplication. In typical operating systems, the block layer of the I/O stack is unaware of the context of the data it is operating on. Thanks to the simplicity and modularity of the block layer interface, it is one of the best places to implement data deduplication. We designed an interface between file systems and the block layer that allows a file system to pass the context of the data to the underlying deduplication system at the block layer. This context is in the form of a ``hint'' to convey information that is useful for the block-layer deduplication system, so that it can optimize its operation. For example, the hint can indicate what data is worthy of deduplication, what data should not be deduplicated at all, or that an impending set of I/O operations are likely to generate lot of duplicates. With hints, we observed a 1.5--2x reduction in I/Os and a 10% improvement in CPU utilization for metadata-intensive workloads, compared to a context-unaware deduplication system at the block layer. Our hinting system degraded the deduplication ratio by only 3--5%. To implement hints, we had to change fewer than 0.6% of the Linux kernel, and we changed approximately 600 LoC of file system code in two file systems (Ext3 and NILFS2). Our block-layer deduplication system is about 4,000 LoC of standalone kernel code.

Description

55 pg.

DOI

Content Designation

Accessibility Statement

Request Accessible Version