什么时候不使用卡桑德拉?[英] When NOT to use Cassandra?



有很多与 Cassandra 最近有关的讨论.



  • 使用卡桑德拉,
  • 不使用卡桑德拉,
  • 使用RDM而不是Cassandra.





在RDBMS的情况下,做出选择非常容易,因为此类别中的所有数据库(如MySQL,Oracle,MS SQL,MS SQL,PostgreSQL)提供了几乎相同类型的解决方案,该解决方案朝向酸性属性.当涉及到NOSQL时,该决定变得困难,因为每个NOSQL数据库都提供不同的解决方案,并且您必须了解哪个最适合您的应用程序/系统要求.例如,MongoDB适合您的系统需要无模式文档存储的用例. HBASE可能适合搜索引擎,分析日志数据,或任何在扫描巨大的,二维无连接表的地方. REDIS旨在提供内存搜索,以了解树木,队列,链接列表等各种数据结构,并且非常适合制作实时排行榜,Pub-Sub类型的系统.同样,此类别中还有其他数据库(包括Cassandra)适合不同的问题语句.现在让我们转到原始问题,然后一个一个一个.


作为NOSQL家族的一部分,Cassandra为您的问题提供了一个解决方案,其中您的一个要求是拥有非常繁重的写入系统,并且您希望在该存储的数据之上具有相当响应迅速的报告系统.考虑Web Analytics的用例,在其中为每个请求存储日志数据,并且您想围绕其构建一个分析平台,以实时通过浏览器,IP等计数每小时的命中量.您可以参考/a>博客文章,以了解有关Cassandra适合的用例的更多信息.






评估分布式数据系统时,您必须考虑CAP定理 - 您可以选择以下两个:一致性,可用性和分区公差.

Cassandra是一种可用的,耐受耐受性的系统,可支持最终的一致性.有关更多信息,请参见这篇博客文章我写的: NOSQL Systems的视觉指南 .


Cassandra是一个特定问题的答案:当您拥有如此多的数据以至于不适合一台服务器时,您会做什么?您如何将所有数据存储在许多服务器上,并且不会破坏您的银行帐户,也不会使开发人员发疯? Facebook每天获得4台新的压缩数据.而且这个数字很可能会在一年内增长两次.






There has been a lot of talk related to Cassandra lately.

Twitter, Digg, Facebook, etc all use it.

When does it make sense to:

  • use Cassandra,
  • not use Cassandra, and
  • use a RDMS instead of Cassandra.


There is nothing like a silver bullet, everything is built to solve specific problems and has its own pros and cons. It is up to you, what problem statement you have and what is the best fitting solution for that problem.

I will try to answer your questions one by one in the same order you asked them. Since Cassandra is based on the NoSQL family of databases, it's important you understand why use a NoSQL database before I answer your questions.

Why use NoSQL

In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.

When to use Cassandra

Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to this blog post to understand more about the use cases where Cassandra fits in.

When to Use a RDMS instead of Cassandra

Cassandra is based on a NoSQL database and does not provide ACID and relational data properties. If you have a strong requirement for ACID properties (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make a workaround for that, however you will end up writing lots of application code to simulate ACID properties and will lose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.

When not to use Cassandra

I don't think it needs to be answered if the above explanation makes sense.


When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.

Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see this blog post I wrote: Visual Guide to NoSQL Systems.


Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.

If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.

However Facebook no longer uses cassandra and now uses MySQL almost exclusively moving the partitioning up in the application stack for faster performance and better control.