使用MongoDB而不是MS SQL Server的优点和缺点[英] Pros and Cons of using MongoDB instead of MS SQL Server

本文是小编为大家收集整理的关于使用MongoDB而不是MS SQL Server的优点和缺点的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我是NOSQL World的新手,并考虑将我的MS SQL Server数据库替换为MongoDB.我的应用程序(用.NET C#编写)与IP摄像机进行交互,并记录来自相机的每个图像的元数据,并将其记录到MS SQL数据库中.平均而言,我每天为每个相机插入约86400个记录,在当前的数据库架构中,我为单独的摄像机图像创建了单独的表格,例如camera_1_images,camera_2_images ... camera_n_images.单图记录由简单的元数据信息组成.像自动,filepath,creationdate.为了添加更多详细信息,我的应用程序启动了每个摄像机的单独过程(.EXE),并且每个过程在数据库的相对表中插入1个记录.

我需要(MongoDB)专家的建议,以下问题:

  1. 告诉MongoDB是否适合持有此类数据,最终将与时间范围进行查询(例如,在指定小时之间检索特定摄像头的所有图像)?关于我的案件的基于文档的架构设计的任何建议吗?

  2. 服务器的规格(CPU,RAM,磁盘)应该是什么?有任何建议吗?

  3. 我应该考虑此情况的碎片/复制(考虑到书面性能与同步副本集的同时)?

  4. 在同一台计算机上使用多个数据库是否有任何好处,以便一个数据库可以保存所有摄像头的当天图像,而第二个数据库将用于存档前一天的图像?我正在考虑有关分裂读取和写入单独的数据库的文章.因为所有读取请求都可以由第二个数据库提供,并将其写入第一个数据库.它会受益吗?如果是,那么任何想法以确保始终同步两个数据库.

请欢迎其他任何建议.

推荐答案

我本人是NOSQL数据库的首发.因此,我以牺牲潜在的投票为代价来回答这一点,但这对我来说将是一次很棒的学习经历.

在尽力回答您的问题之前,我应该说如果MS SQL Server对您的运行良好,然后坚持下去.你还没有 提到了您要使用mongodb的任何有效的理由 您以文档为导向的DB了解了它.而且我看到了 您的捕获几乎相同的元数据 每个相机,即您的模式是动态的.

  • 判断MongoDB是否适合持有此类数据,最终将与时间范围询问(例如,在指定的小时之间检索特定相机的所有图像)?关于我的案件的基于文档的架构设计的任何建议?

mongoDB是一个面向文档的DB,擅长查询在汇总中(您称其为文档).由于您已经将每个相机的数据存储在自己的表中,因此在MongoDB中,您将为每个相机创建一个单独的 Collection . 这是如何执行日期范围查询的方式.

  • 服务器(CPU,RAM,磁盘)的规格应该是什么?有任何建议吗?

所有NOSQL数据库均建立在商品硬件上扩展.但是,顺便说一下,您可能会考虑通过扩展来提高性能.您可以从合理的机器开始,并且随着负载的增加,您可以继续添加更多服务器(扩展).您无需计划和购买高端服务器.

  • 我应该考虑针对这种情况的碎片/复制(同时考虑以书面形式与同步副本集)?

mongodb 锁定整个db (但是其他操作的收益率),适用于具有比写入更多读取的系统.因此,这取决于您的系统.碎片有多种方式,应特定于域.不可能一个通用的答案.但是,可以给出一些示例,例如通过地理,分支等.

.

还阅读 cap Thalorem的简单英语简介

通过回答有关sharding的评论的更新

根据他们的 documentation ,您应该考虑部署一个碎片集群,如果:

  • 您的数据集接近或超过系统中单个节点的存储容量.
  • 您系统主动工作集的大小将很快超过系统的最大RAM量的容量.
  • 您的系统具有大量的写入活动,一个mongoDB实例无法快速写入数据以满足需求,以及所有其他 方法尚未减少争论.

因此,基于最后一点.自动碎片功能旨在扩展写入.在这种情况下,您有一个per shard 的写锁,而不是per 数据库.但是我的是理论上的答案.我建议您从10Gen.com组中咨询.

其他推荐答案

告诉MongoDB是否适合持有此类数据,最终 将与时间范围进行查询(例如,检索A的所有图像 指定小时之间的特定摄像头)?

这个Quiestion太主观了,无法回答.从众多SQL解决方案的个人经验(具有讽刺意味的不是SQL),我会说它们同样同样出色,如果做对的话.

也:

服务器(CPU,RAM,磁盘)的规格应该是什么?有任何建议吗?

取决于只有您知道的太多变量,但是一小群商品硬件效果很好.我真的不能对这个问题做出事实回答,这将取决于您的测试.

至于架构,我会选择结构的文档:

{
    _id: {},
    camera_name: "my awesome camera",
    images: [
        { 
            url: "http://I_like_S3_here.amazons3.com/my_image.png" ,
            // All your other fields per image
        }
    ]
}

,只要您不再嵌入更深的时间,就应该很容易地曼坦和更新,但是它可能会变得有些痛苦,但是,这取决于您的查询.

不仅这样

我应该考虑在这种情况下进行分解/复制(同时考虑以书面形式与同步副本集)?

可能,许多人认为他们需要碎片,而实际上他们只需要在设计数据库方面更加聪明即可. MongoDB是非常自由的形式,因此有很多方法可以做错事,但是话虽如此,还有很多正确的方法.我个人会一直牢记.复制也可能非常有用.

在同一台计算机上使用多个数据库是否有任何好处,以便一个数据库可以容纳所有摄像机的当天图像,而第二个数据库将用于存档前一天的图像?

,即使蒙古犬写锁在数据库级别(目前),我会说:否.正确的文档结构和正确的碎片/复制(如果需要)也应该能够在基于单个文档的集合中处理此操作(S)在单个dB下.不仅如此,您还可以将群集中的写入和读取到某些服务器,从而在群集中的某些机器之间创建并发情况.我将在DB分离上促进正确使用词果的并发功能.

编辑

再次阅读了问题后,我从解决方案中省略了您每天为每个相机插入80k+图像.因此,我实际上是在称为images的集合中每个图像的嵌入式选项,然后是camera集合,然后像在SQL中一样查询两者.

碎片images收集应该在camera_id上同样容易.

还要确保您将工作设置用于服务器.

其他推荐答案

告诉MongoDB是否适合持有此类数据,最终 将与时间范围进行查询(例如,检索A的所有图像 指定小时之间的特定相机)?任何建议 我的案件的基于文档的架构设计?

mongodb可以做到这一点.为了获得更好的性能,您可以在时间字段上设置索引.

服务器(CPU,RAM,磁盘)的规格应该是什么?有任何建议吗?

我认为RAM和磁盘很重要.

  • 如果您不想做sharding到scale out,则应考虑更大尺寸的磁盘,以便将所有数据存储在其中.
  • 您的热数据应该适合您的RAM.如果不是,那么您应该考虑更大的RAM,因为MongoDB的性能主要取决于RAM.

我应该考虑针对这种情况的碎片/复制(而 考虑以书面形式与同步副本集的表现)?

我不知道您有很多相机,即使是1000个插入物/秒,总共1000张相机也应该很容易容易出现.如果您要插入性能,我认为您不需要进行碎片(除了数据大小太大,以至于必须将它们分为几台机器).

另一个问题是您应用程序的读取频率.它很高,然后您可以在此处考虑碎片或复制. 如果您仅在一个时间范围内的一个相机上查询,则可以使用(Timestamp + Camera_id)作为碎片键.

在同一台计算机上使用多个数据库是否有任何好处,因此 一个数据库将对所有相机的当天图像保存,并且 第二个将用于存档前一天的图像?

您可以将表分成两个集合(archive和current).并仅在archive上设置索引,如果您仅在archive上查询日期.没有创建索引的开销,current收集应受益于插入.

您可以编写每日程序将current数据转储到archive中.

本文地址:https://www.itbaoku.cn/post/627392.html

问题描述

I am new to NoSQL world and thinking of replacing my MS Sql Server database to MongoDB. My application (written in .Net C#) interacts with IP Cameras and records meta data for each image coming from Camera, into MS SQL Database. On average, i am inserting about 86400 records per day for each camera and in current database schema I have created separate table for separate Camera images, e.g. Camera_1_Images, Camera_2_Images ... Camera_N_Images. Single image record consists of simple metadata info. like AutoId, FilePath, CreationDate. To add more details to this, my application initiates separate process (.exe) for each camera and each process inserts 1 record per second in relative table in database.

I need suggestions from (MongoDB) experts on following concerns:

  1. to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?

  2. What should be the specs of server (CPU, RAM, Disk)? any suggestion?

  3. Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

  4. Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images? I am thinking on this with respect to splitting reads and writes on separate databases. Because all read requests might be served by second database and writes to first one. Will it benefit or not? If yes then any idea to ensure that both databases are synced always.

Any other suggestions are welcomed please.

推荐答案

I am myself a starter on NoSQL databases. So I am answering this at the expense of potential down votes but it will be a great learning experience for me.

Before trying my best to answer your questions I should say that if MS SQL Server is working well for you then stick with it. You have not mentioned any valid reason WHY you want to use MongoDB except the fact that you learnt about it as a document oriented db. Moreover I see that you have almost the same set of meta-data you are capturing for each camera i.e. your schema is dynamic.

  • to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?

MongoDB being a document oriented db, is good at querying within an aggregate (you call it document). Since you already are storing each camera's data in its own table, in MongoDB you will have a separate collection created for each camera. Here is how you perform date range queries.

  • What should be the specs of server (CPU, RAM, Disk)? any suggestion?

All NoSQL data bases are built to scale-out on commodity hardware. But by the way you have asked the question, you might be thinking of improving performance by scaling-up. You can start with a reasonable machine and as the load increases, you can keep adding more servers (scaling-out). You no need to plan and buy a high end server.

  • Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

MongoDB locks the entire db for a single write (but yields for other operations) and is meant for systems which have more reads than writes. So this depends upon how your system is. There are multiple ways of sharding and should be domain specific. A generic answer is not possible. However some examples can be given like sharding by geography, by branches etc.

Also read A plain english introduction to CAP Theorem

Updated with answer to the comment on sharding

According to their documentation, You should consider deploying a sharded cluster, if:

  • your data set approaches or exceeds the storage capacity of a single node in your system.
  • the size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system.
  • your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention.

So based upon the last point yes. The auto-sharding feature is built to scale writes. In that case, you have a write lock per shard, not per database. But mine is a theoretical answer. I suggest you take consultation from 10gen.com group.

其他推荐答案

to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)?

This quiestion is too subjective for me to answer. From personal experience with numerous SQL solutions (ironically not MS SQL) I would say they are both equally as good, if done right.

Also:

What should be the specs of server (CPU, RAM, Disk)? any suggestion?

Depends on too many variables that only you know, however a small cluster of commodity hardware works quite well. I cannot really give a factual response to this question and it will come down to your testing.

As for a schema I would go for a document of the structure:

{
    _id: {},
    camera_name: "my awesome camera",
    images: [
        { 
            url: "http://I_like_S3_here.amazons3.com/my_image.png" ,
            // All your other fields per image
        }
    ]
}

This should be quite easy to mantain and update so long as you are not embedding much deeper since then it could become a bit of pain, however, that depends upon your queries.

Not only that but this should be good for sharding since you have all the data you need in one document, if you were to shard on _id you could probably get the perfect setup here.

Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

Possibly, many people assume they need to shard when in reality they just need to be more intelligent in how they design the database. MongoDB is very free form so there are a lot of ways to do it wrong, but that being said, there are also a lot of ways of dong it right. I personally would keep sharding in mind. Replication can be very useful too.

Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?

Even though MongoDBs write lock is on DB level (currently) I would say: No. The right document structure and the right sharding/replication (if needed) should be able to handle this in a single document based collection(s) under a single DB. Not only that but you can direct writes and reads within a cluster to certain servers so as to create a concurrency situation between certain machines in your cluster. I would promote the correct usage of MongoDBs concurrency features over DB separation.

Edit

After reading the question again I omitted from my solution that you are inserting 80k+ images for each camera a day. As such instead of the embedded option I would actually make a row per image in a collection called images and then a camera collection and query the two like you would in SQL.

Sharding the images collection should be just as easy on camera_id.

Also make sure you take you working set into consideration with your server.

其他推荐答案

to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?

MongoDB can do this. For better performance, you can set an index on your time field.

What should be the specs of server (CPU, RAM, Disk)? any suggestion?

I think RAM and Disk would be important.

  • If you don't want to do sharding to scale out, you should consider a larger size of disk so you can store all your data in it.
  • Your hot data should can fit into your RAM. If not, then you should consider a larger RAM because the performance of MongoDB mainly depends on RAM.

Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?

I don't know many cameras do you have, even 1000 inserts/second with total 1000 cameras should still be easy to MongoDB. If you are concerning insert performance, I don't think you need to do sharding(Except the data size are too big that you have to separate them into several machines).

Another problem is the read frequency of your application. It it is very high, then you can consider sharding or replication here. And you can use (timestamp + camera_id) as your sharding key if your query only on one camera in a time range.

Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?

You can separate the table into two collections(archive and current). And set index only on archive if you only query date on archive. Without the overhead of index creation, the current collection should benefit with insert.

And you can write a daily program to dump the current data into archive.