20070725

hooray

http://www.google.com/search?client=opera&rls=en&q=BigTable&sourceid=opera&ie=utf-8&oe=utf-8

BigTable is a compressed, high performance and proprietary database built
on Google File System (GFS), Chubby Lock Service and a few other Google
programs; it is currently not distributed or used outside of Google. It
began in 2004[1] and is now used by a number of Google applications, such
as MapReduce, which is often used for generating and modifying data stored
in BigTable[2] Google Reader,[3] Google Maps,[4] Google Print, "My Search
History", Google Earth, Blogger.com, Google Code hosting, Orkut[4], and
YouTube[5] Google's reasons for developing its own database include
licensing costs, scalability, and better control of performance
characteristics.[6]

It is a fast and extremely large-scale system database, with a focus on
quick reads from columns, not rows. It's designed to scale into the
petabyte range across hundreds or thousands of machines, and to make it
easy to add more machines to the system and automatically start taking
advantage of those resources without any reconfiguration".[7] Each table
has multiple dimensions (one of which is a field for time, allowing
versioning). Tables are optimized for GFS by being split into multiple
tablets- segments of the table as split along a row chosen such that the
tablet will be ~200 megabytes in size. When sizes threaten to grow beyond
a specified limit, the tablets are compressed using the algorithms BMDiff
and Zippy, which are described as less space-optimal than LZW but more
efficient in terms of computing time. The locations in the GFS of tablets
are recorded as database entries in multiple special tablets, which are
called "META1" tablets. META1 tablets are found by querying the single
"META0" tablet, which typically has a machine to itself since it is often
queried by clients as to the location of the "META1" tablet which itself
has the answer to the question of where the actual data is located. Like
GFS' master server, the META0 is not generally a bottleneck since the
processor time and bandwidth necessary to discover and transmit META1
locations is minimal and clients aggressively cache locations to minimize
queries.