SQLite还是平面文本文件?[英] SQLite or flat text file?

本文是小编为大家收集整理的关于SQLite还是平面文本文件?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我处理了我在Python,R和MATLAB之间交换的许多文本/数据.

我的首选是平坦的文本文件,但也偶尔使用SQLite来存储每个程序(虽然不是MATLAB)的数据和访问.我在SQL中不使用groupby,AVG等.

对于需要在程序之间交换数据以使用每种语言中的可用库的应用程序,是否有一个好的经验法则,可以使用哪种数据交换格式/使用方法(甚至XML或NetCDF或HDF5)?

我知道python-> r有rpy或rpy2,但我想知道这个问题在更一般的意义上 - 我使用了许多没有RPY2的计算机,并且还使用了其他一些科学分析软件这需要在不同时间访问数据(处理和分析的阶段也分开).

推荐答案

如果所有语言都支持sqlite-使用它. SQL的功能现在可能对您没有用,但是它可能会在某个时候,它可以节省您在决定以更复杂的方式查询数据时必须重写的内容. 如果您只想访问数据存储中的某些数据位,则

sqlite可能会更快地更快 - 因为使用平面文件执行此操作的情况很难挑战,而无需阅读整个文件(尽管并非不可能).

其他推荐答案

平面文本文件(例如,以CSV格式)是最便携的解决方案.几乎每个程序/库都可以使用这种格式:R和Python具有良好的CSV支持,如果您的数据集不大,您甚至可以将CSV导入到Excel中,以完成较小的任务.

但是,对于较大的数据集而言,文本文件是无与伦比的,因为您需要完全读取它们几乎所有操作(取决于数据的结构).

sqlite允许您非常轻松地过滤数据(即使没有太多的SQL专业),正如您已经提到的那样可以自行进行一些计算(AVG,SUM,...).使用firefox插件 sqlitemanager 您可以在每个DB上使用DB计算机没有任何安装/配置故障,因此可以轻松管理您的数据(导入/导出,过滤器).

因此,我建议将SQLITE用于需要大量过滤以提取所需数据的较大数据集.对于较小的数据集,或者如果不需要选择数据的子集,则应该很好的文本文件(CSV).

本文地址:https://www.itbaoku.cn/post/597749.html

问题描述

I process a lot of text/data that I exchange between Python, R, and sometimes Matlab.

My go-to is the flat text file, but also use SQLite occasionally to store the data and access from each program (not Matlab yet though). I don't use GROUPBY, AVG, etc. in SQL as much as I do these operations in R, so I don't necessarily require the database operations.

For such applications that requires exchanging data among programs to make use of available libraries in each language, is there a good rule of thumb on which data exchange format/method to use (even XML or NetCDF or HDF5)?

I know between Python -> R there is rpy or rpy2 but I was wondering about this question in a more general sense - I use many computers which all don't have rpy2 and also use a few other pieces of scientific analysis software that require access to the data at various times (the stages of processing and analysis are also separated).

推荐答案

If all the languages support SQLite - use it. The power of SQL might not be useful to you right now, but it probably will be at some point, and it saves you having to rewrite things later when you decide you want to be able to query your data in more complicated ways.

SQLite will also probably be substantially faster if you only want to access certain bits of data in your datastore - since doing that with a flat-text file is challenging without reading the whole file in (though it's not impossible).

其他推荐答案

A flat text file (e.g. in csv format) would be the most portable solution. Almost every program/library can work with this format: R and Python have good csv support and if your data set isn't too large you can even import the csv into Excel for smaller tasks.

However, text files are unhandily for larger data sets since you need to read them completely for almost all operations (depending on the structure of your data).

SQLite allows you to filter the data very easily (even without much SQL experties) and as you already mentioned can do some computation on its own (AVG, SUM, ...). Using the Firefox Plug-in SQLiteManager you can work with the DB on every computer without any installation/configuration trouble and thus easily manage your data (import/export, filter).

So I would recommend to use SQLite for larger data sets that needs a lot of filtering to extract the data that you need. For smaller data sets or if there is no need to select subsets of your data a flat (csv) text file should be fine.