数据库Blobs与磁盘存储文件[英] Database blobs vs Disk stored files

本文是小编为大家收集整理的关于数据库Blobs与磁盘存储文件的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

所以我有这个要求,说该应用必须让用户每月上传和下载约6000个文件(主要是PDF,DOC,XLS).

我正在考虑为此的最佳解决方案.问题是我是在数据库中使用blob还是简单的文件层次结构来编写/读取这些文件.

应用程序架构基于Java 1.6,Spring 3.1和Dojo,Informix 10.x.

所以我只是在这里得到您的经验.

推荐答案

如果数据库中有关这些文件的其他数据,则将文件存储到文件系统使其更加复杂:

  1. 应分别进行备份.
  2. 必须单独实施交易(甚至可能用于文件系统操作).
  3. 数据库和文件系统结构之间的完整性检查不在框中.
  4. 否级联:删除用户删除用户的图片.
  5. 首先,您必须从数据库查询文件路径,然后从文件系统中选择一个.

基于文件系统的解决方案的好处是,有时能够直接访问文件,例如复制图像的一部分在其他地方很方便.当然,存储二进制数据当然可以大大改变数据库的大小.但是无论如何,两个解决方案都需要更多的磁盘存储.

当然,所有这些都可以询问比当前可用的更多数据库资源.通常,可能会有大量的性能命中,尤其是在本地文件系统和远程数据库之间做出决定的情况下.在您的情况下(每月6000个文件)原始性能不会有问题,但延迟可能是.

其他推荐答案

问什么是"最佳"解决方案时,包括您的评估标准 - 速度,成本,简单,维护等是一个好主意.

Mikko Maunu给出的答案几乎在这笔钱上.我已经20年没有使用Informix,但是在处理斑点时,大多数数据库都有些慢 - 尤其是将BLOB进入和从数据库中输出的步骤可能很慢.

随着越来越多的用户同时访问系统,尤其是当他们使用Web应用程序时,该问题往往会变得更糟,尤其这些请求比正常情况要比"正常"页面要花费的时间要长.

这可能导致网络服务器仅在中等负载下放慢速度.如果您选择将文档存储在数据库中,我强烈建议您进行一些性能测试以查看您是否有问题 - 这种解决方案倾向于在设置中暴露出否则不会亮的缺陷(慢网络连接到您的数据库服务器,网络服务器中的RAM不足等)

)

为了避免这种情况,我已经在数据库中存储了文档的"主"副本,因此它们都备份了,我可以问一下数据库问题,例如"我是否有用于用户x的所有文档? ".但是,我在Web服务器上使用了缓存,以避免从数据库中读取文档的数量超出我的要求.如果您有一个"写一次,阅读许多时间"解决方案(例如内容管理系统),则可以在此处赚取其保留.

本文地址:https://www.itbaoku.cn/post/597628.html

问题描述

So I have this requirement that says the app must let users upload and download about 6000 files per month (mostly pdf, doc, xls).

I was thinking about the optimal solution for this. Question is whether I'll use BLOb's in my database or a simple file hierarchy for writing/reading these bunch of files.

The app architecture is based on Java 1.6, Spring 3.1 and DOJO, Informix 10.X.

So I'm here just to be advised based on your experience.

推荐答案

If you have other data in database in relation to these files, storing files to file system makes it more complex:

  1. Back-up should be done separately.
  2. Transactions have to be separately implemented (as far as even possible for file system operations).
  3. Integrity checks between database and file system structure do not come out of the box.
  4. No cascades: removing users pictures as consequence of removing user.
  5. First you have to query for path of file from database and then pick one from file system.

What is good with file system based solution is that sometimes it is handy to be able to directly access files, for example copying part of the images somewhere else. Also storing binary data of course can dramatically change size of database. But in any case, more disk storage is needed somewhere with both solutions.

Of course all of this can ask more DB resources than currently available. There can be in general significant performance hit, especially if decision is between local file system and remote DB. In your case (6000 files monthly) raw performance will not be problem, but latency can be.

其他推荐答案

When asking what's the "best" solution, it's a good idea to include your evaluation criteria - speed, cost, simplicity, maintenance etc.

The answer Mikko Maunu gave is pretty much on the money. I haven't used Informix in 20 years, but most databases are a little slow when dealing with BLOBs - especially the step of getting the BLOB into and out of the database can be slow.

That problem tends to get worse as more users access the system simultaneously, especially if they use a web application - the application server has to work quite hard to get the files in and out of the database, probably consumes far more memory for those requests than normal, and probably takes longer to complete the file-related requests than for "normal" pages.

This can lead to the webserver slowing down under only moderate load. If you choose to store the documents in your database, I'd strongly recommend running some performance tests to see if you have a problem - this kind of solution tends to expose flaws in your setup that wouldn't otherwise come to light (slow network connection to your database server, insufficient RAM in your web servers, etc.)

To avoid this, I've stored the "master" copies of the documents in the database, so they all get backed up together, and I can ask the database questions like "do I have all the documents for user x?". However, I've used a cache on the webserver to avoid reading documents from the database more than I needed to. This works well if you have a "write once, read many" time solution like a content management system, where the cache can earn its keep.