1803d6920SChris Mumford**LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.** 2803d6920SChris Mumford 3dd1c3c35SLars-Magnus Skog[](https://travis-ci.org/google/leveldb) 4dd1c3c35SLars-Magnus Skog 5803d6920SChris MumfordAuthors: Sanjay Ghemawat ([email protected]) and Jeff Dean ([email protected]) 6803d6920SChris Mumford 7803d6920SChris Mumford# Features 8803d6920SChris Mumford * Keys and values are arbitrary byte arrays. 9803d6920SChris Mumford * Data is stored sorted by key. 10803d6920SChris Mumford * Callers can provide a custom comparison function to override the sort order. 11803d6920SChris Mumford * The basic operations are `Put(key,value)`, `Get(key)`, `Delete(key)`. 12803d6920SChris Mumford * Multiple changes can be made in one atomic batch. 13803d6920SChris Mumford * Users can create a transient snapshot to get a consistent view of data. 14803d6920SChris Mumford * Forward and backward iteration is supported over the data. 15edf2939cSVenilton FalvoJr * Data is automatically compressed using the [Snappy compression library](http://google.github.io/snappy/). 16803d6920SChris Mumford * External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions. 170e0f0741SPaul Irish 180e0f0741SPaul Irish# Documentation 19*d0883b60Scmumford [LevelDB library documentation](https://github.com/google/leveldb/blob/master/doc/index.md) is online and bundled with the source code. 20803d6920SChris Mumford 21803d6920SChris Mumford 22803d6920SChris Mumford# Limitations 23803d6920SChris Mumford * This is not a SQL database. It does not have a relational data model, it does not support SQL queries, and it has no support for indexes. 24803d6920SChris Mumford * Only a single process (possibly multi-threaded) can access a particular database at a time. 25803d6920SChris Mumford * There is no client-server support builtin to the library. An application that needs such support will have to wrap their own server around the library. 26803d6920SChris Mumford 274753c9b6Scmumford# Contributing to the leveldb Project 284753c9b6ScmumfordThe leveldb project welcomes contributions. leveldb's primary goal is to be 294753c9b6Scmumforda reliable and fast key/value store. Changes that are in line with the 304753c9b6Scmumfordfeatures/limitations outlined above, and meet the requirements below, 314753c9b6Scmumfordwill be considered. 324753c9b6Scmumford 334753c9b6ScmumfordContribution requirements: 344753c9b6Scmumford 354753c9b6Scmumford1. **POSIX only**. We _generally_ will only accept changes that are both 364753c9b6Scmumford compiled, and tested on a POSIX platform - usually Linux. Very small 374753c9b6Scmumford changes will sometimes be accepted, but consider that more of an 384753c9b6Scmumford exception than the rule. 394753c9b6Scmumford 404753c9b6Scmumford2. **Stable API**. We strive very hard to maintain a stable API. Changes that 414753c9b6Scmumford require changes for projects using leveldb _might_ be rejected without 424753c9b6Scmumford sufficient benefit to the project. 434753c9b6Scmumford 444753c9b6Scmumford3. **Tests**: All changes must be accompanied by a new (or changed) test, or 454753c9b6Scmumford a sufficient explanation as to why a new (or changed) test is not required. 464753c9b6Scmumford 474753c9b6Scmumford## Submitting a Pull Request 484753c9b6ScmumfordBefore any pull request will be accepted the author must first sign a 494753c9b6ScmumfordContributor License Agreement (CLA) at https://cla.developers.google.com/. 504753c9b6Scmumford 514753c9b6ScmumfordIn order to keep the commit timeline linear 524753c9b6Scmumford[squash](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Squashing-Commits) 534753c9b6Scmumfordyour changes down to a single commit and [rebase](https://git-scm.com/docs/git-rebase) 544753c9b6Scmumfordon google/leveldb/master. This keeps the commit timeline linear and more easily sync'ed 554753c9b6Scmumfordwith the internal repository at Google. More information at GitHub's 564753c9b6Scmumford[About Git rebase](https://help.github.com/articles/about-git-rebase/) page. 574753c9b6Scmumford 58803d6920SChris Mumford# Performance 59803d6920SChris Mumford 60803d6920SChris MumfordHere is a performance report (with explanations) from the run of the 61803d6920SChris Mumfordincluded db_bench program. The results are somewhat noisy, but should 62803d6920SChris Mumfordbe enough to get a ballpark performance estimate. 63803d6920SChris Mumford 64803d6920SChris Mumford## Setup 65803d6920SChris Mumford 66803d6920SChris MumfordWe use a database with a million entries. Each entry has a 16 byte 67803d6920SChris Mumfordkey, and a 100 byte value. Values used by the benchmark compress to 68803d6920SChris Mumfordabout half their original size. 69803d6920SChris Mumford 70803d6920SChris Mumford LevelDB: version 1.1 71803d6920SChris Mumford Date: Sun May 1 12:11:26 2011 72803d6920SChris Mumford CPU: 4 x Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz 73803d6920SChris Mumford CPUCache: 4096 KB 74803d6920SChris Mumford Keys: 16 bytes each 75803d6920SChris Mumford Values: 100 bytes each (50 bytes after compression) 76803d6920SChris Mumford Entries: 1000000 77803d6920SChris Mumford Raw Size: 110.6 MB (estimated) 78803d6920SChris Mumford File Size: 62.9 MB (estimated) 79803d6920SChris Mumford 80803d6920SChris Mumford## Write performance 81803d6920SChris Mumford 82803d6920SChris MumfordThe "fill" benchmarks create a brand new database, in either 83803d6920SChris Mumfordsequential, or random order. The "fillsync" benchmark flushes data 84803d6920SChris Mumfordfrom the operating system to the disk after every operation; the other 85803d6920SChris Mumfordwrite operations leave the data sitting in the operating system buffer 86803d6920SChris Mumfordcache for a while. The "overwrite" benchmark does random writes that 87803d6920SChris Mumfordupdate existing keys in the database. 88803d6920SChris Mumford 89803d6920SChris Mumford fillseq : 1.765 micros/op; 62.7 MB/s 90803d6920SChris Mumford fillsync : 268.409 micros/op; 0.4 MB/s (10000 ops) 91803d6920SChris Mumford fillrandom : 2.460 micros/op; 45.0 MB/s 92803d6920SChris Mumford overwrite : 2.380 micros/op; 46.5 MB/s 93803d6920SChris Mumford 94803d6920SChris MumfordEach "op" above corresponds to a write of a single key/value pair. 95803d6920SChris MumfordI.e., a random write benchmark goes at approximately 400,000 writes per second. 96803d6920SChris Mumford 97803d6920SChris MumfordEach "fillsync" operation costs much less (0.3 millisecond) 98803d6920SChris Mumfordthan a disk seek (typically 10 milliseconds). We suspect that this is 99803d6920SChris Mumfordbecause the hard disk itself is buffering the update in its memory and 100803d6920SChris Mumfordresponding before the data has been written to the platter. This may 101803d6920SChris Mumfordor may not be safe based on whether or not the hard disk has enough 102803d6920SChris Mumfordpower to save its memory in the event of a power failure. 103803d6920SChris Mumford 104803d6920SChris Mumford## Read performance 105803d6920SChris Mumford 106803d6920SChris MumfordWe list the performance of reading sequentially in both the forward 107803d6920SChris Mumfordand reverse direction, and also the performance of a random lookup. 108803d6920SChris MumfordNote that the database created by the benchmark is quite small. 109803d6920SChris MumfordTherefore the report characterizes the performance of leveldb when the 110803d6920SChris Mumfordworking set fits in memory. The cost of reading a piece of data that 111803d6920SChris Mumfordis not present in the operating system buffer cache will be dominated 112803d6920SChris Mumfordby the one or two disk seeks needed to fetch the data from disk. 113803d6920SChris MumfordWrite performance will be mostly unaffected by whether or not the 114803d6920SChris Mumfordworking set fits in memory. 115803d6920SChris Mumford 116803d6920SChris Mumford readrandom : 16.677 micros/op; (approximately 60,000 reads per second) 117803d6920SChris Mumford readseq : 0.476 micros/op; 232.3 MB/s 118803d6920SChris Mumford readreverse : 0.724 micros/op; 152.9 MB/s 119803d6920SChris Mumford 120803d6920SChris MumfordLevelDB compacts its underlying storage data in the background to 121803d6920SChris Mumfordimprove read performance. The results listed above were done 122803d6920SChris Mumfordimmediately after a lot of random writes. The results after 123803d6920SChris Mumfordcompactions (which are usually triggered automatically) are better. 124803d6920SChris Mumford 125803d6920SChris Mumford readrandom : 11.602 micros/op; (approximately 85,000 reads per second) 126803d6920SChris Mumford readseq : 0.423 micros/op; 261.8 MB/s 127803d6920SChris Mumford readreverse : 0.663 micros/op; 166.9 MB/s 128803d6920SChris Mumford 129803d6920SChris MumfordSome of the high cost of reads comes from repeated decompression of blocks 130803d6920SChris Mumfordread from disk. If we supply enough cache to the leveldb so it can hold the 131803d6920SChris Mumforduncompressed blocks in memory, the read performance improves again: 132803d6920SChris Mumford 133803d6920SChris Mumford readrandom : 9.775 micros/op; (approximately 100,000 reads per second before compaction) 134803d6920SChris Mumford readrandom : 5.215 micros/op; (approximately 190,000 reads per second after compaction) 135803d6920SChris Mumford 136803d6920SChris Mumford## Repository contents 137803d6920SChris Mumford 1387fa20948ScmumfordSee [doc/index.md](doc/index.md) for more explanation. See 1397fa20948Scmumford[doc/impl.md](doc/impl.md) for a brief overview of the implementation. 140803d6920SChris Mumford 141803d6920SChris MumfordThe public interface is in include/*.h. Callers should not include or 142803d6920SChris Mumfordrely on the details of any other header files in this package. Those 143803d6920SChris Mumfordinternal APIs may be changed without warning. 144803d6920SChris Mumford 145803d6920SChris MumfordGuide to header files: 146803d6920SChris Mumford 147803d6920SChris Mumford* **include/db.h**: Main interface to the DB: Start here 148803d6920SChris Mumford 149803d6920SChris Mumford* **include/options.h**: Control over the behavior of an entire database, 150803d6920SChris Mumfordand also control over the behavior of individual reads and writes. 151803d6920SChris Mumford 152803d6920SChris Mumford* **include/comparator.h**: Abstraction for user-specified comparison function. 153803d6920SChris MumfordIf you want just bytewise comparison of keys, you can use the default 154803d6920SChris Mumfordcomparator, but clients can write their own comparator implementations if they 155803d6920SChris Mumfordwant custom ordering (e.g. to handle different character encodings, etc.) 156803d6920SChris Mumford 157803d6920SChris Mumford* **include/iterator.h**: Interface for iterating over data. You can get 158803d6920SChris Mumfordan iterator from a DB object. 159803d6920SChris Mumford 160803d6920SChris Mumford* **include/write_batch.h**: Interface for atomically applying multiple 161803d6920SChris Mumfordupdates to a database. 162803d6920SChris Mumford 163803d6920SChris Mumford* **include/slice.h**: A simple module for maintaining a pointer and a 164803d6920SChris Mumfordlength into some other byte array. 165803d6920SChris Mumford 166803d6920SChris Mumford* **include/status.h**: Status is returned from many of the public interfaces 167803d6920SChris Mumfordand is used to report success and various kinds of errors. 168803d6920SChris Mumford 169803d6920SChris Mumford* **include/env.h**: 170803d6920SChris MumfordAbstraction of the OS environment. A posix implementation of this interface is 171803d6920SChris Mumfordin util/env_posix.cc 172803d6920SChris Mumford 173803d6920SChris Mumford* **include/table.h, include/table_builder.h**: Lower-level modules that most 174803d6920SChris Mumfordclients probably won't use directly 175