Java文件IO与本地数据库[英] Java File IO vs Local database

本文是小编为大家收集整理的关于Java文件IO与本地数据库的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我正在研究一个涉及快速通过大量数据解析的项目.目前,此数据在磁盘上并分解为目录层次结构:

(Folder: DataSource) -> (Files: Day1, Day2, Day3...Day1000...)
(Folder: DataSource2) -> (Files: Day1, Day2, Day3...Day1000...) 
...
(Folder: DataSource1000) -> ...
...

每天文件由需要很快访问的条目组成.

我的最初计划是使用Java中的传统文件访问这些文件,但是在进一步阅读后,我开始担心这可能太慢了.

简而言之,我可以从不同的数据源和几天从文件系统中选择性加载条目的最快方法是什么?

推荐答案

这个问题可以双向解决,但取决于几个因素

去fileio.

  1. 如果卷为 <毫秒,
  2. 如果您不像乔恩·斯凯特(Jon Skeet)那样做复杂的查询
  3. 如果您的参考来获取行是通过使用hte文件夹名称:" dataSource"作为键

for db

  1. 如果您看到程序阅读通过数百万记录
  2. 您可以进行复杂的选择,甚至使用单个选择.
  3. 如果您知道为DB创建基本表结构

其他推荐答案

根据您使用的架构,您可以实现不同的缓存方式,在JBOSS中,有一个内置的JBOSS缓存,还有第三方OpenSource软件,可以利用依赖Redis或Ehcache的缓存,或关于您的需求.基本上是缓存的对象将对象存储在其内存中,有些会根据需求进行钝化/激活,当存储器用尽时,它将被存储为物理IO文件,也很容易地激活该文件,该文件被缓存机构编组.它降低了您的程序持有的数据库连接.还有其他缓存,但这里有一些我与之合作过:

其他推荐答案

从不同的数据源和天中,我可以从文件系统中选择性加载条目的最快方法是什么?

有选择地表示过滤,因此我的答案是localhost数据库.一般而言,如果您从大量记录中进行过滤,分类,分页或提取不同的记录,则很难击败Localhost SQL Server.您会获得查询优化器(没有人做Java),一个缓存(需要在Java中进行努力,尤其是无效),数据库索引(也没有看到在Java中完成的措施)等.然后您正在java中编写数据库.

最重要的是,您可以访问更高级别的SQL函数,例如窗口aggegrates等,因此在大多数情况下,无需在Java中进行后处理数据.

本文地址:https://www.itbaoku.cn/post/597351.html

问题描述

I am working on a project that involves parsing through a LARGE amount of data rapidly. Currently this data is on disk and broken down into a directory hierarchy:

(Folder: DataSource) -> (Files: Day1, Day2, Day3...Day1000...)
(Folder: DataSource2) -> (Files: Day1, Day2, Day3...Day1000...) 
...
(Folder: DataSource1000) -> ...
...

Each Day file consists of entries that need to be accessed very quickly.

My initial plans were to use traditional FileIO in java to access these files, but upon further reading, I began to fear that this might be too slow.

In short, what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?

推荐答案

The issue could be solved both ways but it depends on few factors

go for FileIO.

  1. if the volume is < millons of rows
  2. if your dont do a complicated query like Jon Skeet said
  3. if your referance for fetching the row is by using hte Folder Name: "DataSource" as the key

go for DB

  1. if you see your program reading through millions of records
  2. you can do complicated selection, even multiple rows using a single select.
  3. if you have knowledge of creating a basic table structure for DB

其他推荐答案

Depending on architecture you are using you can implement different ways of caching, in the Jboss there is a built-in Jboss Caching, there are also third party opensource software that lets utilizes caching, like Redis, or EhCache depending on your needs. Basically Caching stores objects in their memory, some are passivated/activated upon demand, when memory is exhausted it is stored as a physical IO file, which are also easily activated marshalled by the caching mechanism. It lowers the database connectivity held by your program. There are other caches but here are some of them that I've worked with:

其他推荐答案

what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?

selectively means filtering, so my answer is a localhost database. Generally speaking if you filter, sort, paginate or extract distinct records from a large number of records, it's hard to beat a localhost SQL server. You get a query optimizer (nobody does that Java), a cache (which requires effort in Java, especially the invalidation), database indexes (have not seen that being done in Java either) etc. It's possible to implement these things manually, but then your are writing a database in Java.

On top of this you gain access to higher level SQL functions like window aggegrates etc., so in most cases there is no need to post-process data in Java.