谷歌的Bigtable与关系型数据库的对比[英] Google's Bigtable vs. A Relational Database






Bigtable是Google的发明,可以处理公司定期交易的大量信息.笨拙的数据集可以增长到巨大的尺寸(许多之前),并在大量服务器上分布了存储空间.使用Bortable的系统包括Google的Web索引和Google Earth等项目.

根据 Google Whitepaper 关于该主题:


bigtable versus的内部力学,例如,mySQL是如此不同,以至于使比较变得困难,并且预期的目标也不会重叠.但是,您可以认为有点像单表数据库.想象一下,例如,如果您尝试使用MySQL数据库来实现Google的整个Web搜索系统,您将遇到的困难 - BigTable围绕解决这些问题而建立.

可以使用名为GQL(" Gee-kwal")的语言来查询诸如Appengine之类的服务,该数据集基于SQL的子集. GQL中明显缺少任何形式的JOIN命令.由于宏伟的数据库的分布性质,在两个表之间执行连接效率非常低.相反,程序员必须在其应用程序中实现此类逻辑,或设计其应用程序以使其不需要.


Google的Bigtable和其他类似项目(例如: couchdb hbase )是定向的数据库系统,因此数据主要是不划定(即,重复和分组).

主要优点是: - 加入操作的成本降低了 - 由于数据独立性,数据的复制/分布的成本较小(即,如果要在两个节点上分发数据,则可能不会存在一个在另一个节点中具有一个实体和其他相关实体的问题,因为相似数据分组)






I don't know much about Google's Bigtable but am wondering what the difference between Google's Bigtable and relational databases like MySQL is. What are the limitations of both?


Bigtable is Google's invention to deal with the massive amounts of information that the company regularly deals in. A Bigtable dataset can grow to immense size (many petabytes) with storage distributed across a large number of servers. The systems using Bigtable include projects like Google's web index and Google Earth.

According to Google whitepaper on the subject:

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

The internal mechanics of Bigtable versus, say, MySQL are so dissimilar as to make comparison difficult, and the intended goals don't overlap much either. But you can think of Bigtable a bit like a single-table database. Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database -- Bigtable was built around solving those problems.

Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOIN command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.


Google's BigTable and other similar projects (ex: CouchDB, HBase) are database systems that are oriented so that data is mostly denormalized (ie, duplicated and grouped).

The main advantages are: - Join operations are less costly because of the denormalization - Replication/distribution of data is less costly because of data independence (ie, if you want to distribute data across two nodes, you probably won't have the problem of having an entity in one node and other related entity in another node because similar data is grouped)

This kind of systems are indicated for applications that need to achieve optimal scale (ie, you add more nodes to the system and performance increases proportionally). In an RDBMS like MySQL or Oracle, when you start adding more nodes if you join two tables that are not in the same node, the join cost is higher. This becomes important when you are dealing with high volumes.

RDBMS' are nice because of the richness of the storage model (tables, joins, fks). Distributed databases are nice because of the ease of scale.