xref/leveldb-1.20/README.md

803d6920SChris Mumford**LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.**
803d6920SChris Mumford
dd1c3c35SLars-Magnus Skog[![Build Status](https://travis-ci.org/google/leveldb.svg?branch=master)](https://travis-ci.org/google/leveldb)
dd1c3c35SLars-Magnus Skog
803d6920SChris MumfordAuthors: Sanjay Ghemawat ([email protected]) and Jeff Dean ([email protected])
803d6920SChris Mumford
803d6920SChris Mumford# Features
803d6920SChris Mumford  * Keys and values are arbitrary byte arrays.
803d6920SChris Mumford  * Data is stored sorted by key.
803d6920SChris Mumford  * Callers can provide a custom comparison function to override the sort order.
803d6920SChris Mumford  * The basic operations are `Put(key,value)`, `Get(key)`, `Delete(key)`.
803d6920SChris Mumford  * Multiple changes can be made in one atomic batch.
803d6920SChris Mumford  * Users can create a transient snapshot to get a consistent view of data.
803d6920SChris Mumford  * Forward and backward iteration is supported over the data.
edf2939cSVenilton FalvoJr  * Data is automatically compressed using the [Snappy compression library](http://google.github.io/snappy/).
803d6920SChris Mumford  * External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.
0e0f0741SPaul Irish
0e0f0741SPaul Irish# Documentation
*d0883b60Scmumford  [LevelDB library documentation](https://github.com/google/leveldb/blob/master/doc/index.md) is online and bundled with the source code.
803d6920SChris Mumford
803d6920SChris Mumford
803d6920SChris Mumford# Limitations
803d6920SChris Mumford  * This is not a SQL database.  It does not have a relational data model, it does not support SQL queries, and it has no support for indexes.
803d6920SChris Mumford  * Only a single process (possibly multi-threaded) can access a particular database at a time.
803d6920SChris Mumford  * There is no client-server support builtin to the library.  An application that needs such support will have to wrap their own server around the library.
803d6920SChris Mumford
4753c9b6Scmumford# Contributing to the leveldb Project
4753c9b6ScmumfordThe leveldb project welcomes contributions. leveldb's primary goal is to be
4753c9b6Scmumforda reliable and fast key/value store. Changes that are in line with the
4753c9b6Scmumfordfeatures/limitations outlined above, and meet the requirements below,
4753c9b6Scmumfordwill be considered.
4753c9b6Scmumford
4753c9b6ScmumfordContribution requirements:
4753c9b6Scmumford
4753c9b6Scmumford1. **POSIX only**. We _generally_ will only accept changes that are both
4753c9b6Scmumford   compiled, and tested on a POSIX platform - usually Linux. Very small
4753c9b6Scmumford   changes will sometimes be accepted, but consider that more of an
4753c9b6Scmumford   exception than the rule.
4753c9b6Scmumford
4753c9b6Scmumford2. **Stable API**. We strive very hard to maintain a stable API. Changes that
4753c9b6Scmumford   require changes for projects using leveldb _might_ be rejected without
4753c9b6Scmumford   sufficient benefit to the project.
4753c9b6Scmumford
4753c9b6Scmumford3. **Tests**: All changes must be accompanied by a new (or changed) test, or
4753c9b6Scmumford   a sufficient explanation as to why a new (or changed) test is not required.
4753c9b6Scmumford
4753c9b6Scmumford## Submitting a Pull Request
4753c9b6ScmumfordBefore any pull request will be accepted the author must first sign a
4753c9b6ScmumfordContributor License Agreement (CLA) at https://cla.developers.google.com/.
4753c9b6Scmumford
4753c9b6ScmumfordIn order to keep the commit timeline linear
4753c9b6Scmumford[squash](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Squashing-Commits)
4753c9b6Scmumfordyour changes down to a single commit and [rebase](https://git-scm.com/docs/git-rebase)
4753c9b6Scmumfordon google/leveldb/master. This keeps the commit timeline linear and more easily sync'ed
4753c9b6Scmumfordwith the internal repository at Google. More information at GitHub's
4753c9b6Scmumford[About Git rebase](https://help.github.com/articles/about-git-rebase/) page.
4753c9b6Scmumford
803d6920SChris Mumford# Performance
803d6920SChris Mumford
803d6920SChris MumfordHere is a performance report (with explanations) from the run of the
803d6920SChris Mumfordincluded db_bench program.  The results are somewhat noisy, but should
803d6920SChris Mumfordbe enough to get a ballpark performance estimate.
803d6920SChris Mumford
803d6920SChris Mumford## Setup
803d6920SChris Mumford
803d6920SChris MumfordWe use a database with a million entries.  Each entry has a 16 byte
803d6920SChris Mumfordkey, and a 100 byte value.  Values used by the benchmark compress to
803d6920SChris Mumfordabout half their original size.
803d6920SChris Mumford
803d6920SChris Mumford    LevelDB:    version 1.1
803d6920SChris Mumford    Date:       Sun May  1 12:11:26 2011
803d6920SChris Mumford    CPU:        4 x Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
803d6920SChris Mumford    CPUCache:   4096 KB
803d6920SChris Mumford    Keys:       16 bytes each
803d6920SChris Mumford    Values:     100 bytes each (50 bytes after compression)
803d6920SChris Mumford    Entries:    1000000
803d6920SChris Mumford    Raw Size:   110.6 MB (estimated)
803d6920SChris Mumford    File Size:  62.9 MB (estimated)
803d6920SChris Mumford
803d6920SChris Mumford## Write performance
803d6920SChris Mumford
803d6920SChris MumfordThe "fill" benchmarks create a brand new database, in either
803d6920SChris Mumfordsequential, or random order.  The "fillsync" benchmark flushes data
803d6920SChris Mumfordfrom the operating system to the disk after every operation; the other
803d6920SChris Mumfordwrite operations leave the data sitting in the operating system buffer
803d6920SChris Mumfordcache for a while.  The "overwrite" benchmark does random writes that
803d6920SChris Mumfordupdate existing keys in the database.
803d6920SChris Mumford
803d6920SChris Mumford    fillseq      :       1.765 micros/op;   62.7 MB/s
803d6920SChris Mumford    fillsync     :     268.409 micros/op;    0.4 MB/s (10000 ops)
803d6920SChris Mumford    fillrandom   :       2.460 micros/op;   45.0 MB/s
803d6920SChris Mumford    overwrite    :       2.380 micros/op;   46.5 MB/s
803d6920SChris Mumford
803d6920SChris MumfordEach "op" above corresponds to a write of a single key/value pair.
803d6920SChris MumfordI.e., a random write benchmark goes at approximately 400,000 writes per second.
803d6920SChris Mumford
803d6920SChris MumfordEach "fillsync" operation costs much less (0.3 millisecond)
803d6920SChris Mumfordthan a disk seek (typically 10 milliseconds).  We suspect that this is
803d6920SChris Mumfordbecause the hard disk itself is buffering the update in its memory and
803d6920SChris Mumfordresponding before the data has been written to the platter.  This may
803d6920SChris Mumfordor may not be safe based on whether or not the hard disk has enough
803d6920SChris Mumfordpower to save its memory in the event of a power failure.
803d6920SChris Mumford
803d6920SChris Mumford## Read performance
803d6920SChris Mumford
803d6920SChris MumfordWe list the performance of reading sequentially in both the forward
803d6920SChris Mumfordand reverse direction, and also the performance of a random lookup.
803d6920SChris MumfordNote that the database created by the benchmark is quite small.
803d6920SChris MumfordTherefore the report characterizes the performance of leveldb when the
803d6920SChris Mumfordworking set fits in memory.  The cost of reading a piece of data that
803d6920SChris Mumfordis not present in the operating system buffer cache will be dominated
803d6920SChris Mumfordby the one or two disk seeks needed to fetch the data from disk.
803d6920SChris MumfordWrite performance will be mostly unaffected by whether or not the
803d6920SChris Mumfordworking set fits in memory.
803d6920SChris Mumford
803d6920SChris Mumford    readrandom  : 16.677 micros/op;  (approximately 60,000 reads per second)
803d6920SChris Mumford    readseq     :  0.476 micros/op;  232.3 MB/s
803d6920SChris Mumford    readreverse :  0.724 micros/op;  152.9 MB/s
803d6920SChris Mumford
803d6920SChris MumfordLevelDB compacts its underlying storage data in the background to
803d6920SChris Mumfordimprove read performance.  The results listed above were done
803d6920SChris Mumfordimmediately after a lot of random writes.  The results after
803d6920SChris Mumfordcompactions (which are usually triggered automatically) are better.
803d6920SChris Mumford
803d6920SChris Mumford    readrandom  : 11.602 micros/op;  (approximately 85,000 reads per second)
803d6920SChris Mumford    readseq     :  0.423 micros/op;  261.8 MB/s
803d6920SChris Mumford    readreverse :  0.663 micros/op;  166.9 MB/s
803d6920SChris Mumford
803d6920SChris MumfordSome of the high cost of reads comes from repeated decompression of blocks
803d6920SChris Mumfordread from disk.  If we supply enough cache to the leveldb so it can hold the
803d6920SChris Mumforduncompressed blocks in memory, the read performance improves again:
803d6920SChris Mumford
803d6920SChris Mumford    readrandom  : 9.775 micros/op;  (approximately 100,000 reads per second before compaction)
803d6920SChris Mumford    readrandom  : 5.215 micros/op;  (approximately 190,000 reads per second after compaction)
803d6920SChris Mumford
803d6920SChris Mumford## Repository contents
803d6920SChris Mumford
7fa20948ScmumfordSee [doc/index.md](doc/index.md) for more explanation. See
7fa20948Scmumford[doc/impl.md](doc/impl.md) for a brief overview of the implementation.
803d6920SChris Mumford
803d6920SChris MumfordThe public interface is in include/*.h.  Callers should not include or
803d6920SChris Mumfordrely on the details of any other header files in this package.  Those
803d6920SChris Mumfordinternal APIs may be changed without warning.
803d6920SChris Mumford
803d6920SChris MumfordGuide to header files:
803d6920SChris Mumford
803d6920SChris Mumford* **include/db.h**: Main interface to the DB: Start here
803d6920SChris Mumford
803d6920SChris Mumford* **include/options.h**: Control over the behavior of an entire database,
803d6920SChris Mumfordand also control over the behavior of individual reads and writes.
803d6920SChris Mumford
803d6920SChris Mumford* **include/comparator.h**: Abstraction for user-specified comparison function.
803d6920SChris MumfordIf you want just bytewise comparison of keys, you can use the default
803d6920SChris Mumfordcomparator, but clients can write their own comparator implementations if they
803d6920SChris Mumfordwant custom ordering (e.g. to handle different character encodings, etc.)
803d6920SChris Mumford
803d6920SChris Mumford* **include/iterator.h**: Interface for iterating over data. You can get
803d6920SChris Mumfordan iterator from a DB object.
803d6920SChris Mumford
803d6920SChris Mumford* **include/write_batch.h**: Interface for atomically applying multiple
803d6920SChris Mumfordupdates to a database.
803d6920SChris Mumford
803d6920SChris Mumford* **include/slice.h**: A simple module for maintaining a pointer and a
803d6920SChris Mumfordlength into some other byte array.
803d6920SChris Mumford
803d6920SChris Mumford* **include/status.h**: Status is returned from many of the public interfaces
803d6920SChris Mumfordand is used to report success and various kinds of errors.
803d6920SChris Mumford
803d6920SChris Mumford* **include/env.h**:
803d6920SChris MumfordAbstraction of the OS environment.  A posix implementation of this interface is
803d6920SChris Mumfordin util/env_posix.cc
803d6920SChris Mumford
803d6920SChris Mumford* **include/table.h, include/table_builder.h**: Lower-level modules that most
803d6920SChris Mumfordclients probably won't use directly