每种类型的数据库的实际例子(真实案例)。[英] Practical example for each type of database (real cases)

本文是小编为大家收集整理的关于每种类型的数据库的实际例子(真实案例)。的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

有几种类型的数据库用于不同的目的,但是通常MySQL用于所有内容,因为是最了解的数据库.只是为了在我的公司中举一个榜样,一个大数据的应用在初始阶段具有MySQL数据库,这是令人难以置信的,并会给公司带来严重的后果.为什么要mysql?仅仅因为没有人知道如何(以及何时)使用其他DBM.

因此,我的问题不是关于供应商,而是数据库的类型.您能为我提供一个针对每种类型数据库的特定情况(或应用程序)的实际示例,强烈建议在哪里使用它?

示例:

•社交网络应由于y.

而使用X型

•MongoDB或Couch DB无法支持交易,因此文档DB对银行或拍卖网站的应用不好.

等等...


关系: mysql postgresql sqlite /a>, mariadb oracle db sql server ibm Informix teradata

对象: zodb db4o eloquera versant"> versant"> versant 客观性db velocitydb

图形数据库: aregrograph infinitegraph graphbase flockdb brightstardb /p>

关键值商店: amazon dynamodb redis riak voldemort FoundationDB LevelDB bangdb > kai hamsterdb tarantool maxtable hyperdex href =" http://hyperdex.org/"> genomu memcachedb

专栏族: 大表格 hbase hyper Table cassandra apache accamulo

RDF商店: apache jena 芝麻

多模型数据库: arangodb datomic orient db fatdb alchemydb

文档: mongo db 沙发db rethink db raven db terrastore jas db raptor db djon db ejdb denso db couchbase

XML数据库: basex sedna 存在

层次结构: Intersystemscaché gt.m 感谢@laurent parenteau

推荐答案

我发现了有关此主题的两篇令人印象深刻的文章. highScalability.com 的所有学分.此答案中的信息从这些文章中转录:

> 35+用于选择您的下一个NOSQL数据库的用例

您实际上在使用nosql是什么?


如果您的应用程序需要...

•复杂的交易因为您负担不起丢失数据,或者如果您想要一个简单的交易编程模型,请查看关系或网格数据库.

•示例:一个可能想要完整的库存系统.当我购买产品时,我非常不高兴,他们后来说他们没有库存.我不想要补偿交易.我想要我的物品!

•缩放然后NOSQL或SQL可以工作.寻找支持缩放,分区,实时添加和拆卸机器,负载平衡,自动分解和重新平衡以及容忍度的系统.

•到总是能够写入数据库,因为您需要高可用性,然后查看 bigtable 最终具有一致性的克隆.

•要处理大量的小型连续读取和写入,可能是波动的,然后查看文档或键值或数据库,提供快速的内存访问.另外,考虑 ssd .

•要实现社交网络操作,您首先可能想要一个图形数据库或第二个数据库,例如 Riak 支持关系.具有简单SQL连接的内存关系数据库可能足以满足小型数据集. redis '设置和列表操作也可以工作.

•要通过操作多种访问模式和数据类型然后查看文档数据库,它们通常是灵活的,并且表现良好.

•功能强大的带有大数据集的离线报告然后查看 hadoop 第一和第二,支持 mapreduce .支持MapReduce与擅长的MapReduce不同.

•to 跨越多个数据计数然后查看 bigtable 克隆和其他提供分布式选项的产品,可以处理延长潜伏期,并且是 partition bolerant .

•要构建 crud 应用程序,然后查看文档数据库,它们使得无需加入即可访问复杂的数据.

•内置搜索然后查看 riak .

•要在数据结构上操作喜欢列表,set,queues,publish-ubscribe,然后查看 redis "> redis .对于分布式锁定,封顶日志和更多信息.

•程序员友好率以程序员友好的数据类型的形式,例如JSON,HTTP,REST,JAVASCRIPT,然后首先查看文档数据库,然后查看键值数据库.

•交易与实体视图结合 实时数据供稿,然后查看 voltdb .非常适合Data-Rollups和时间窗口.

•企业级别的支持和SLA 然后寻找可以迎合该市场的产品. membase 是一个例子.

•要记录连续流的数据可能根本没有必要的一致性保证,然后查看 bigtable 克隆,因为它们通常在可以处理大量写入的分布式文件系统上工作.

•要尽可能简单操作,然后寻找托管或 paas 解决方案,因为他们将为您完成所有工作.

•要出售给企业客户然后考虑一个关系数据库,因为它们用于关系技术.

•to 在具有动态属性的对象之间动态建立关系,然后考虑图形数据库,因为它们通常不需要架构,并且可以通过编程来逐步构建模型.

•要支持大型媒体然后查看存储服务,例如 s3 . nosql 系统倾向于不处理大型 blobs ,尽管 mongodb 有一个文件服务.

•到批量上传快速有效地上传,然后寻找支持该方案的产品.大多数人不会因为他们不支持批量操作.

•更轻松的升级路径然后使用流体架构系统,例如文档数据库或键值数据库,因为它支持可选字段,添加字段和字段删除,而无需构建整个整体模式迁移框架.

•要实现完整性约束然后选择一个支持SQL ddl ,在存储过程中实现它们,或在应用程序代码中实现它们.

•a 非常深的连接深度然后使用图数据库,因为它们支持实体之间的快速快速导航.

•要移动行为接近数据,因此不必通过网络移动数据,然后查看一种或另一种的存储过程.这些可以在关系,网格,文档甚至键值数据库中找到.

•到缓存或存储blob 数据,然后查看键值存储.缓存可以用于网页的位,或保存加入关系数据库的复杂对象,以减少延迟,依此类推.

•A 已验证的记录喜欢不损坏数据,并且通常工作,然后选择一个已建立的产品,当您达到缩放率(或其他问题)时,请使用其中一种常见的解决方法(缩放,调整,调整,memcached, sharding 否定化等).

•流体数据类型,因为您的数据本质上不是表格,或者需要灵活数量的列,或者具有复杂的结构或因用户(或其他)而变化,然后查看文档,钥匙值和 bigtable clone数据库.每个人的数据类型都有很大的灵活性.

•其他业务部门运行快速的关系查询,因此您不必重新完成所有内容,然后使用支持SQL的数据库.

•要在云中操作并自动充分利用云功能,我们可能还没有.

•支持辅助索引,因此您可以通过不同的键查找数据,然后查看关系数据库和 cassandra 的新 >支持.

•创建一组不断增长的数据(真正 bigdata )很少被访问,然后查看 bigtable clone将把数据传播到分布式文件系统上.

•到与其他服务集成然后检查数据库是否提供了某种写入同步功能,因此您可以捕获数据库更改并将其馈送到其他系统中以确保一致性.

•容忍度检查耐用的写入方式,脸部功率故障,分区和其他故障场景.

•要将技术信封推向一个似乎没有人前进的方向,然后自己构建它,因为有时候这是伟大的.

•要在移动平台上工作,然后查看couchdb/移动couchbase .


一般用例(NOSQL)

• Bigness . NOSQL被视为新数据堆栈支持的关键部分:大数据,大量用户,大量计算机,大型供应链,大科学等.当某些东西变得如此庞大以至于必须大量分布时,NOSQL就在那里,尽管并非所有NOSQL系统都针对大. Bigness可以跨许多不同的维度,而不仅仅是使用大量磁盘空间.

•大量写作表现.这可能是基于Google的影响力的规范用法.高音量. Facebook需要存储每月1350亿封信 (2010年).例如,Twitter存在 7 tb/data per (2010年),这一要求每年多次增加了一倍.这是数据太大,无法适应一个节点问题.在80 mb/s的情况下,存储7TB的时间需要一天,因此需要在集群上分发写入,这意味着键值访问,mapReduce,复制,容忍度,一致性问题以及所有其余所有内容.对于更快的写入内存系统,可以使用.

•快速键值访问.这可能是一般思维集中NOSQL的第二大所引用的优点.当延迟很重要时,很难在钥匙上击败哈希并直接从内存或磁盘寻求的范围读取值.并非每种NOSQL产品都与快速访问有关,例如,有些产品更多地是关于可靠性的.但是人们很长一段时间以来想要的是一个更好的备忘录,许多NOSQL系统提供了.

•灵活的模式和灵活的数据类型. NOSQL产品支持各种新数据类型,这是NOSQL创新的主要领域.我们有:面向列的,图形,高级数据结构,面向文档和键值.复杂的对象可以很容易地存储,而无需大量映射.开发人员喜欢避免复杂的模式和 orm 框架.缺乏结构可以提高灵活性.我们还拥有JSON等程序和程序员友好的兼容数据类型.

•模式迁移.示意性使得更容易处理模式迁移而又不必担心.模式在某种意义上是动态性的,因为它们是在运行时的应用程序强加的,因此应用程序的不同部分可以具有不同的架构视图.

•写入可用性.您的写作是否需要成功?然后,我们可以分区, cap dba 正确,将稍微统计化等等,程序员会更喜欢一个可以自己工作的系统.制作产品的性能并不难.钱是问题的一部分.如果缩放产品的成本很大,那么您不会使用较便宜的产品,您可以控制,这更容易使用,并且更易于扩展?

•将正确的数据模型用于正确的问题.不同的数据模型用于解决不同的问题.例如,已经将大量精力楔入了关系模型,但行之有效.在图数据库中解决图形问题不是更好吗?现在,我们看到了试图找到问题和解决方案之间最佳拟合的一般策略.

•避免撞墙.许多项目在其项目中撞到了某种类型的墙.他们已经用尽了所有选择以制定系统秤或正确执行的选择,并且想知道下一步是什么?选择一种产品和一种可以通过逐步添加的资源线性扩展来跳过墙壁的方法是令人欣慰的.一次不可能.自定义构建了一切,但这已经改变.现在,我们看到一个项目可以轻松采用的可用开箱即用产品.

•分布式系统支持.并不是每个人都担心非NOSQL系统可以实现的规模或性能.他们需要的是一个分布式系统,该系统可以在没有打ic的情况下处理故障方案时跨越数据中心. NOSQL系统由于它们已集中在规模上,因此倾向于利用分区,因此往往不使用严格的一致性协议,因此在分布式方案中处于良好状态.

可调式盖帽交易. nosql系统通常是唯一具有"滑块"的产品,可以选择他们想要降落在帽频谱上的产品.关系数据库选择强大的一致性,这意味着他们无法忍受分区故障.最后,这是一个业务决定,应逐案决定.您的应用程序甚至关心一致性吗?有几滴好吗?您的应用需要强或弱的一致性吗?可用性更重要还是一致性?沮丧会比错误更昂贵吗?有可以为您提供选择的产品真是太好了.

•更具体的用例

•管理大型非传播数据流:apache日志,应用程序日志, mysql 等.

•在线和离线数据同步.这是一个利基 couchdb 已成为目标.

•所有负载下的快速响应时间.

•避免避免重型加入,以确保复杂加入的查询负载变得太大,无法 rdbms "> rdbms "> rdbms .

•较低延迟至关重要的软实时系统.游戏是一个例子.

•需要支持各种不同的写作,阅读,查询和一致性模式的应用程序.有50%读取的系统,有50%的写入,95%的写入或95%的读取系统.只读的应用需要极高的速度和弹性,简单的查询,并且可以忍受略微陈旧的数据.需要适度性能,读/写入访问,简单查询,完全权威数据的应用程序.仅阅读应用程序,该应用程序复杂的查询要求.

•负载余额以适应数据和使用浓度,并帮助保持微处理器忙碌.

•实时插入,更新和查询.

•分层数据等螺纹讨论和零件爆炸.

•动态表创建.

•通过快速NOSQL接口提供低延迟数据的两层应用程序,但是可以通过高延迟Hadoop应用程序或其他低优先级应用程序来计算和更新数据本身.

•顺序数据读数.需要选择右下角数据存储模型. B树可能不是顺序读取的最佳模型.

•切除一部分服务的一部分,可能需要在其自己的系统上更好地性能/可伸缩性.例如,用户登录可能需要高性能,此功能可以使用专用服务来满足这些目标.

•缓存.网站和其他应用程序的高性能缓存层.示例是大型强子对撞机使用的数据聚合系统的缓存. 投票.

•实时页面视图计数器.

•用户注册,配置文件和会话数据.

•文档,目录管理和内容管理系统.这些存储复杂文档的能力促进了这些功能的整体而不是组织为关系表.类似的逻辑也适用于库存,购物车和其他结构化数据类型.

•存档.存储仍然可以在线访问的大量连续数据流.带有文档的数据库具有灵活的模式,可以随着时间的推移处理模式变化.

•分析.使用MapReduce,Hive或Pig执行支持高写入负载的分析查询和扩展系统.

•使用异构类型的数据类型,例如,在普通媒体类型中,级别.

•嵌入式系统.他们不想要SQL和服务器的开销,因此他们使用更简单的存储空间.

•"市场"游戏,您在城镇拥有建筑物.您需要快速弹出的建筑物列表,以便在构建表的所有者列上分区,以便选择单分门.但是,当某人购买其他人的建造时,您会更新所有者列和价格.

jpl 正在使用 simpledb 存储 rover"> rover rover 计划属性.参考文献保留在 s3 中. (source)(source)/sup>

•联邦执法机构实时跟踪美国人使用信用卡,会员卡和旅行预订.

欺诈检测通过实时比较交易与已知模式. p>

帮助诊断 通过整合每个患者的历史来帮助肿瘤的类型.<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

•用于高更新情况的内存数据库,例如网站 "最后一个活动"时间(也许是聊天).如果用户每30秒每30秒执行一次活动,那么您将与约5000个同时用户处于限制.

•使用实体视图处理低频多分类查询,同时继续处理高频流数据.

•优先队列.

•使用程序友好的接口在缓存数据上运行计算,而无需通过 orm .

uniq a a imiq a uniq a uniq a uniq a uniq a uniq a uniq a a iNIQ p>

•为了保持快速查询,可以将值汇入不同的时间切片.

•计算两个大型集的交集,其中连接太慢.

•a 时间轴ala twitter .

redis用例,voltdb用例和更多在此处找到.

其他推荐答案

由于普遍性,这个问题几乎不可能回答.我认为您正在寻找某种简单的答案问题=解决方案.问题在于,随着业务成为业务,每个"问题"变得越来越独特.

您如何称呼社交网络?推特? Facebook? LinkedIn?堆栈溢出?他们都为不同的零件使用不同的解决方案,并且可以存在许多使用Polyglot方法的解决方案. Twitter具有类似概念的图形,但是只有1度的连接,关注者和关注者.另一方面,LinkedIn蓬勃发展,展示人们如何与一级超出一级的联系.这是两个不同的处理和数据需求,但两者都是"社交网络".

如果您有一个"社交网络",但不要采用任何发现机制,那么您可以轻松地使用最可能的任何基本键值存储.如果您需要高性能,水平量表并且将具有辅助索引或全文搜索,则可以使用 couchbase .

如果您在收集的日志数据之上进行机器学习,则可以将Hadoop与Hive或Pig或Spark/Shark集成.或者,您可以进行lambda架构,并在风暴中使用许多不同的系统.

如果您正在通过图形进行发现,例如超出2度顶点的查询,并且还过滤了Edge属性,则可能会在主商店顶部考虑图形数据库.但是,对于会话存储或作为通用商店的图形数据库并不是一个不错的选择,因此您需要一个polyglot解决方案才能有效.

数据速度是多少?规模?您想如何管理它.您在公司或初创公司中提供的专业知识是什么.有很多原因这不是一个简单的问题和答案.

本文地址:https://www.itbaoku.cn/post/597382.html

问题描述

There are several types of database for different purposes, however normally MySQL is used to everything, because is the most well know Database. Just to give an example in my company an application of big data has a MySQL database at an initial stage, what is unbelievable and will bring serious consequences to the company. Why MySQL? Just because no one know how (and when) should use another DBMS.

So, my question is not about vendors, but type of databases. Can you give me an practical example of specific situations (or apps) for each type of database where is highly recommended to use it?

Example:

• A social network should use the type X because of Y.

• MongoDB or couch DB can't support transactions, so Document DB is not good to an app for a bank or auctions site.

And so on...


Relational: MySQL, PostgreSQL, SQLite, Firebird, MariaDB, Oracle DB, SQL server, IBM DB2, IBM Informix, Teradata

Object: ZODB, DB4O, Eloquera, Versant , Objectivity DB, VelocityDB

Graph databases: AllegroGraph, Neo4j, OrientDB, InfiniteGraph, graphbase, sparkledb, flockdb, BrightstarDB

Key value-stores: Amazon DynamoDB, Redis, Riak, Voldemort, FoundationDB, leveldb, BangDB, KAI, hamsterdb, Tarantool, Maxtable, HyperDex, Genomu, Memcachedb

Column family: Big table, Hbase, hyper table, Cassandra, Apache Accumulo

RDF Stores: Apache Jena, Sesame

Multimodel Databases: arangodb, Datomic, Orient DB, FatDB, AlchemyDB

Document: Mongo DB, Couch DB, Rethink DB, Raven DB, terrastore, Jas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase

XML Databases: BaseX, Sedna, eXist

Hierarchical: InterSystems Caché, GT.M thanks to @Laurent Parenteau

推荐答案

I found two impressive articles about this subject. All credits to highscalability.com. The information in this answer is transcribed from these articles:

35+ Use Cases For Choosing Your Next NoSQL Database

What The Heck Are You Actually Using NoSQL For?


If Your Application Needs...

complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.

Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!

to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.

• to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.

• to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also, consider SSD.

• to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in-memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.

• to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.

• powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.

• to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.

• to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.

built-in search then look at Riak.

• to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.

programmer friendliness in the form of programmer-friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.

enterprise-level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.

• to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.

• to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.

• to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.

• to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.

• to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

• to bulk upload lots of data quickly and efficiently then look for a product that supports that scenario. Most will not because they don't support bulk operations.

• an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.

• to implement integrity constraints then pick a database that supports SQL DDL, implement them in stored procedures, or implement them in application code.

• a very deep join depth then use a Graph Database because they support blisteringly fast navigation between entities.

• to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

• to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.

• a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use one of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).

fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.

• other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.

• to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.

• support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.

• create an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.

• to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.

fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.

• to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.

• to work on a mobile platform then look at CouchDB/Mobile couchbase.


General Use Cases (NoSQL)

Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.

Massive write performance. This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month (in 2010). Twitter, for example, has the problem of storing 7 TB/data per day (in 2010) with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.

Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.

Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.

Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.

Write availability. Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.

Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.

No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.

Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.

Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?

Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.

Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.

Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.

Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.

More Specific Use Cases

• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.

• Syncing online and offline data. This is a niche CouchDB has targeted.

• Fast response times under all loads.

• Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.

• Soft real-time systems where low latency is critical. Games are one example.

• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.

• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.

• Real-time inserts, updates, and queries.

• Hierarchical data like threaded discussions and parts explosion.

• Dynamic table creation.

• Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.

Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.

• Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.

Caching. A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider. Voting.

• Real-time page view counters.

• User registration, profile, and session data.

Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.

Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.

Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.

• Working with heterogeneous types of data, for example, different media types at a generic level.

• Embedded systems. They don’t want the overhead of SQL and servers, so they use something simpler for storage.

• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.

JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3. (source)

• Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.

Fraud detection by comparing transactions to known patterns in real-time.

Helping diagnose the typology of tumors by integrating the history of every patient.

• In-memory database for high update situations, like a website that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.

• Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.

• Priority queues.

• Running calculations on cached data, using a program friendly interface, without having to go through an ORM.

Uniq a large dataset using simple key-value columns.

• To keep querying fast, values can be rolled-up into different time slices.

• Computing the intersection of two massive sets, where a join would be too slow.

• A timeline ala Twitter.

Redis use cases, VoltDB use cases and more find here.

其他推荐答案

This question is almost impossible to answer because of the generality. I think you are looking for some sort of easy answer problem = solution. The problem is that each "problem" becomes more and more unique as it becomes a business.

What do you call a social network? Twitter? Facebook? LinkedIn? Stack Overflow? They all use different solutions for different parts, and many solutions can exist that use polyglot approach. Twitter has a graph like concept, but there are only 1 degree connections, followers and following. LinkedIn on the other hand thrives on showing how people are connected beyond first degree. These are two different processing and data needs, but both are "social networks".

If you have a "social network" but don't do any discovery mechanisms, then you can easily use any basic key-value store most likely. If you need high performance, horizontal scale, and will have secondary indexes or full-text search, you could use Couchbase.

If you are doing machine learning on top of the log data you are gathering, you can integrate Hadoop with Hive or Pig, or Spark/Shark. Or you can do a lambda architecture and use many different systems with Storm.

If you are doing discovery via graph like queries that go beyond 2nd degree vertexes and also filter on edge properties you likely will consider graph databases on top of your primary store. However graph databases aren't good choices for session store, or as general purpose stores, so you will need a polyglot solution to be efficient.

What is the data velocity? scale? how do you want to manage it. What are the expertise you have available in the company or startup. There are a number of reasons this is not a simple question and answer.