饲养员应用的多线程架构[英] Multi-threaded architecture for a feeder application

本文是小编为大家收集整理的关于饲养员应用的多线程架构的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

本文来自:IT宝库(https://www.itbaoku.cn)

这是我在这里的第一篇文章,所以如果结构不好,则很抱歉.

我们已被任命设计一种将:

的工具
  • 读取(帐户ID)的文件,CSV格式
  • 从网络下载每个帐户的帐户数据文件(通过ID)(REST API)
  • 将文件传递给将产生报告(财务预测等)的转换器[〜20ms]
  • 如果预测阈值在限制范围内,请运行解析器以分析数据[400ms]
  • 生成了以上分析的报告[80ms]
  • 将生成的所有文件上传到Web(REST API)

现在,所有这些要点都相对容易.我有兴趣找出如何最好地构建某些东西来处理此问题,并在我们的硬件上快速有效地做到这一点.

我们必须处理约200万个帐户.方括号给出一个平均每个过程需要多长时间的想法.我想使用机器上可用的最大资源-24 Core Xeon处理器.这不是记忆密集型过程.

使用TPL并将每个任务作为任务创建是个好主意吗?每个都必须依次发生,但可以一次完成许多.不幸的是,解析器没有多线程意识,我们没有来源(这本质上是黑匣子的).

我的想法就是这样 - 假设我们正在使用TPL:

  • 加载帐户数据(本质上是CSV导入或SQL SELECT)
  • 对于每个帐户(ID):
    • 下载每个帐户的数据文件
    • 继续使用数据文件,发送到转换器
    • 继续检查阈值,发送到解析器
    • 继续生成报告
    • 继续与上传输出
    • 继续

这听起来很可行,还是我不正确理解它?最好以不同的方式分解步骤?

我有点不确定如何处理解析器抛出异常(非常挑剔)或我们上传失败时.

所有这些都将在计划的工作中作为控制台应用程序.

推荐答案

我会考虑使用某种Messagebus.因此,您可以将步骤分开,如果一个步骤不起作用(例如,因为一段时间内的休息服务无法访问),则可以存储以后处理的消息.

取决于您用作Messagebus的内容,您可以使用它引入线程.

我认为,如果您有更高的抽象,例如服务巴士,则可以更好地设计工作流程,处理卓越状态等.

也可以使零件可以独立地运行它们不会互相阻挡.

一种简单的方法可能是使用 Servicestack Messaging 带有Redis ServiceBus. >

从那里引用的一些优点:

  • 基于消息的设计允许更容易平行和内省计算

  • DLQ消息可以内置,修复和后来在服务器更新后重播并重新加入普通消息工作流程

其他推荐答案

我认为从多个 thread 开始的简便方法将在您的情况下将每个帐户ID的整个操作放在a thread 中(或者更好) ,在A threadpool 中).以下面提议的方式,我认为您不需要控制线际上的操作.

将数据放在线程池队列上:

var accountIds = new List<int>();
foreach (var accountId in accountIds)
{
    ThreadPool.QueueUserWorkItem(ProcessAccount, accountId);
}

这是您将处理每个帐户的功能:

public static void ProcessAccount(object accountId)
{
    // Download the data file for this account
    // ContinueWith using the data file, send to the converter
    // ContinueWith check threshold, send to parser
    // ContinueWith Generate Report
    // ContinueWith Upload outputs
}

本文地址:https://www.itbaoku.cn/post/437666.html

问题描述

This is my first post here, so apologies if this isn't structured well.

We have been tasked to design a tool that will:

  • Read a file (of account IDs), CSV format
  • Download the account data file from the web for each account (by Id) (REST API)
  • Pass the file to a converter that will produce a report (financial predictions etc) [~20ms]
  • If the prediction threshold is within limits, run a parser to analyse the data [400ms]
  • Generate a report for the analysis above [80ms]
  • Upload all files generated to the web (REST API)

Now all those individual points are relatively easy to do. I'm interested in finding out how best to architect something to handle this and to do it fast & efficiently on our hardware.

We have to process roughly around 2 Million accounts. The square brackets gives an idea of how long each process takes on average. I'd like to use the maximum resources available on the machine - 24 core Xeon processors. It's not a memory intensive process.

Would using TPL and creating each of these as a task be a good idea? Each has to happen sequentially but many can be done at once. Unfortunately the parsers are not multi-threading aware and we don't have the source (it's essentially a black box for us).

My thoughts were something like this - assumes we're using TPL:

  • Load account data (essentially a CSV import or SQL SELECT)
  • For each Account (Id):
    • Download the data file for each account
    • ContinueWith using the data file, send to the converter
    • ContinueWith check threshold, send to parser
    • ContinueWith Generate Report
    • ContinueWith Upload outputs

Does that sound feasible or am I not understanding it correctly? Would it be better to break down the steps a different way?

I'm a bit unsure on how to handle issues with the parser throwing exceptions (it's very picky) or when we get failures uploading.

All this is going to be in a scheduled job that will run after-hours as a console application.

推荐答案

I would think about using some kind of messagebus. So you can seperate the steps and if one wouldn't work (for example because the REST Service isn't accessible for some time) you can store the message for processing them later on.

Depending on what you use as a messagebus you can introduce threads with it.

In my opinion you could better design workflows, handle exceptional states and so on, if you have a more high level abstraction like a service bus.

Also beaucase the parts could run indepdently they don't block each other.

One easy way could be to use servicestack messaging with Redis ServiceBus.

Some advantages quoted from there:

  • Message-based design allows for easier parallelization and introspection of computations

  • DLQ messages can be introspected, fixed and later replayed after server updates and rejoin normal message workflow

其他推荐答案

I think the easy way to start with multiple thread in your case, will be putting the entire operation for each account id in a thread (or better, in a ThreadPool). In the proposed way below, I think you will not need to control inter-thread operations.

Something like this to put the data on the thread pool queue:

var accountIds = new List<int>();
foreach (var accountId in accountIds)
{
    ThreadPool.QueueUserWorkItem(ProcessAccount, accountId);
}

And this is the function you will process each account:

public static void ProcessAccount(object accountId)
{
    // Download the data file for this account
    // ContinueWith using the data file, send to the converter
    // ContinueWith check threshold, send to parser
    // ContinueWith Generate Report
    // ContinueWith Upload outputs
}