The recent commoditization of USB-based flash disks, mainly used in digital cameras, mobile music/video players and cell phones, has many pundits and technologists predict that flash memory-based disks will become the mass storage of choice on mainstream laptop computers in two to three years. In fact, some of the ultra mobile PCs, such as AsusTeks Eee PC, already use flash disks as the only mass storage device. Given the much better performance characteristics and enormous economies of scale behind the flash disk technology, it appears inevitable that flash disks will replace magnetic disks as the main persistent storage technology, at least in some classes of computers.Compared with magnetic disks, flash disks consume less power, take less space, and are more reliable because they don't include any mechanical parts. Moreover, flash disks offer much better latency and throughput in general because they work just like a RAM chip and don't incur any head positioning overhead. However, existing flash disk technology has two major drawbacks that render it largely a niche technology at this point. First, flash disk technology is still quite expensive, approximately $10-15/GByte, which is at least 20 times as expensive as magnetic disks. Indeed, at this price point, it is not uncommon that a flash disk costs as much as the computer it is installed on. Second, flash disks performance is better than magnetic disk when the input workload consists of sequential reads, random reads, or sequential writes. Under a random write workload, flash disks performance is comparable to that of magnetic disk at best, and in some cases actually worse. We believe the cost issue will diminish over time as the PC industry shifts its storage technology investment from magnetic to flash disks. However, flash disks random write performance problem is rooted in the way flash memory cells are modified, and thus cannot be easily addressed. This document describes the design and implementation of a log-structured flash storage manager (LFSM) that effectively answers this problem.A flash memory chip is typically organized into a set of erasure units (EUs) (typically 256 Kbytes), each of which is the basic unit of erasure and in turn consists of a set of 512-byte sectors, which correspond to the basic units of read and write. After an EU is erased, writes to any of its sectors can proceed without triggering an erasure if their target addresses are disjoint. That is, after a sector is written and before it can be written the second time, it must be erased first. Because of this peculiar property of flash memory, random writes to a storage area mapped to an EU may trigger repeated copying of the storage area to a free EU and erasing of the original EU holding the storage area, and thus result in significant performance overhead.Moreover, flash disks typically come with a flash translation layer (FTL), which is implemented in firmware, maps logical disk sectors, which are exposed to the software, to physical disk sectors, and performs various optimizations such as wear leveling, which equalizes the physical write frequency of EUs. This logical-to-physical map will require 64 million entries if it keeps track of individual 512-byte sectors on a 32-GB flash disk. To reduce this maps memory requirement, commodity flash disks increase the mapping granularity, sometimes to the level of an EU. As a result of this coarser mapping granularity, two temporally separate writes to the same mapping unit, say an EU, will trigger a copy and erasure operation if the target address of the second write is not larger than that of the first write, because a commodity flash disk cannot always tell whether a disk sector in an EU has already been written previously. That is, if the N-th sector of a mapping unit is written, any attempt to write to any sector whose sector number is less than or equal to N will require an erasure, even if the target sector itself has not been written at all. Consequently, coarser mapping granularity further aggravates flash disks random write performance problem.To address the random write performance problem, LFSM converts all random writes into sequential writes to a set of unified logs by introducing an additional level of indirection above the FTL. Because all commercial flash disks have good sequential write performance, LFSM effectively solves the random write performance problem for these disks in a uniform way without requiring any modifications to their hardware implementations. With this log-structured storage organization, LFSM needs to overcome two major challenges. First, LFSM still face random writes because it needs to maintain a separate map for the level of indirection or translation it introduces and writes to this map are random. LFSM minimizes the performance overhead of these random writes by using a technique called BUSC, batching updates with sequential commit. Second, to minimize the amount of copying whenever LFSM reclaims an EU, it needs to allocate EUs to logical blocks in such a way that logical blocks assigned to the same EU have a similar life time and each EU contains the stabilized utilization ratio, which means it is less possible the utilization ratio will be changed in the future.