时间序列数据的键值存储?[英] key-value store for time series data?

本文是小编为大家收集整理的关于时间序列数据的键值存储?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我一直在使用SQL Server来存储历史时间序列数据以数十万个对象,每天观察到约100次.我发现查询(在时间T1和时间T2之间给我所有的对象XYZ的值)太慢(根据我的需求,慢一点比一秒钟更多).我由时间戳和对象ID索引.

我很高兴想到使用像MongoDB这样的钥匙值商店,但我不确定这是否是对这种事情的"适当"使用,我找不到任何提及使用这样的数据库进行时间序列数据.理想情况下,我可以做以下查询:

  • 在时间T1和时间T2之间检索对象XYZ的所有数据
  • 执行上述操作,但每天返回一个日期点(首先,最后,关闭时间t ...)
  • 检索特定时间戳的所有对象的所有数据

应该订购数据,理想情况下,编写新数据并更新现有数据应该很快.

似乎我希望通过对象ID以及时间戳查询,可能需要以不同的方式索引数据库的两个副本以获得最佳性能...任何人都有任何构建这样的系统的经验,并使用键 - 值商店或HDF5或其他?还是在SQL Server中完全可行,我只是做对了吗?

推荐答案

听起来MongoDB非常适合.更新和插入非常快,因此您可能需要为每个事件创建一个文档,例如:

{
   object: XYZ,
   ts : new Date()
}

然后,您可以索引TS字段,查询也将很快. (顺便说一句,您可以在单个数据库上创建多个索引.)

如何进行三个查询:

检索对象xyz的所有数据 时间T1和时间T2

之间
db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})

做以上,但返回一个日期 每天点(首先,最后,关闭 时间t ...)

// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)

最接近一段时间,您可能需要一个自定义的JavaScript函数,但这是可行的.

检索所有对象的所有数据 特定的时间戳

db.data.find({ts : timestamp})

随时在用户列表否则可能会想到一种更简单的时间来获得最接近的活动.

其他推荐答案

这就是为什么存在特定时间序列数据的数据库 - 关系数据库根本不够快.

我已经使用了 fame 在投资银行进行了很多.它非常快,但我想非常昂贵.但是,如果您的应用程序需要速度,则可能值得研究.

其他推荐答案

我写的是有效开发的开源时间数据库(仅为.NET).它可以以"二进制平面文件"方式存储统一数据的大量(Terrabytes).所有用法都是面向流的(正向或反向).我们积极将其用于公司的股票滴答存储和分析.

我不确定这将是您所需的,但是它将允许您获得前两个点 - 对于任何系列(每个文件一个系列),获得从T1到T2的值,或者只需一个数据点.<<<<<<<<<<<

https://code.google.com/p/timeseriesdb/

// Create a new file for MyStruct data.
// Use BinCompressedFile<,> for compressed storage of deltas
using (var file = new BinSeriesFile<UtcDateTime, MyStruct>("data.bts"))
{
   file.UniqueIndexes = true; // enforces index uniqueness
   file.InitializeNewFile(); // create file and write header
   file.AppendData(data); // append data (stream of ArraySegment<>)
}

// Read needed data.
using (var file = (IEnumerableFeed<UtcDateTime, MyStrut>) BinaryFile.Open("data.bts", false))
{
    // Enumerate one item at a time maxitum 10 items starting at 2011-1-1
    // (can also get one segment at a time with StreamSegments)
    foreach (var val in file.Stream(new UtcDateTime(2011,1,1), maxItemCount = 10)
        Console.WriteLine(val);
}

本文地址:https://www.itbaoku.cn/post/597610.html

问题描述

I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.

I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:

  • retrieve all the data for object XYZ between time t1 and time t2
  • do the above, but return one date point per day (first, last, closed to time t...)
  • retrieve all data for all objects for a particular timestamp

the data should be ordered, and ideally it should be fast to write new data as well as update existing data.

it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?

推荐答案

It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:

{
   object: XYZ,
   ts : new Date()
}

Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)

How to do your three queries:

retrieve all the data for object XYZ between time t1 and time t2

db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})

do the above, but return one date point per day (first, last, closed to time t...)

// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)

For closest to some time, you'd probably need a custom JavaScript function, but it's doable.

retrieve all data for all objects for a particular timestamp

db.data.find({ts : timestamp})

Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.

其他推荐答案

This is why databases specific to time series data exist - relational databases simply aren't fast enough for large time series.

I've used Fame quite a lot at investment banks. It's very fast but I imagine very expensive. However if your application requires the speed it might be worth looking it.

其他推荐答案

There is an open source timeseries database under active development (.NET only for now) that I wrote. It can store massive amounts (terrabytes) of uniform data in a "binary flat file" fashion. All usage is stream-oriented (forward or reverse). We actively use it for the stock ticks storage and analysis at our company.

I am not sure this will be exactly what you need, but it will allow you to get the first two points - get values from t1 to t2 for any series (one series per file) or just take one data point.

https://code.google.com/p/timeseriesdb/

// Create a new file for MyStruct data.
// Use BinCompressedFile<,> for compressed storage of deltas
using (var file = new BinSeriesFile<UtcDateTime, MyStruct>("data.bts"))
{
   file.UniqueIndexes = true; // enforces index uniqueness
   file.InitializeNewFile(); // create file and write header
   file.AppendData(data); // append data (stream of ArraySegment<>)
}

// Read needed data.
using (var file = (IEnumerableFeed<UtcDateTime, MyStrut>) BinaryFile.Open("data.bts", false))
{
    // Enumerate one item at a time maxitum 10 items starting at 2011-1-1
    // (can also get one segment at a time with StreamSegments)
    foreach (var val in file.Stream(new UtcDateTime(2011,1,1), maxItemCount = 10)
        Console.WriteLine(val);
}