A Context Aware Block Layer: The Case for Block Layer Deduplication
MetadataShow full item record
The context of data is important for optimal performance of data management systems like deduplication. In typical operating systems, the block layer of the I/O stack is unaware of the context of the data it is operating on. Thanks to the simplicity and modularity of the block layer interface, it is one of the best places to implement data deduplication. We designed an interface between file systems and the block layer that allows a file system to pass the context of the data to the underlying deduplication system at the block layer. This context is in the form of a ``hint'' to convey information that is useful for the block-layer deduplication system, so that it can optimize its operation. For example, the hint can indicate what data is worthy of deduplication, what data should not be deduplicated at all, or that an impending set of I/O operations are likely to generate lot of duplicates. With hints, we observed a 1.5--2x reduction in I/Os and a 10% improvement in CPU utilization for metadata-intensive workloads, compared to a context-unaware deduplication system at the block layer. Our hinting system degraded the deduplication ratio by only 3--5%. To implement hints, we had to change fewer than 0.6% of the Linux kernel, and we changed approximately 600 LoC of file system code in two file systems (Ext3 and NILFS2). Our block-layer deduplication system is about 4,000 LoC of standalone kernel code.