问题描述
我正在使用Python进行相当简单的CGI.我将将其放入Django等.总体设置是非常标准的服务器端( 计算在服务器上完成):
- 用户上传数据文件,然后单击"运行"按钮
- 使用大量RAM和处理器功率在幕后并行的服务器叉作业. 〜5-10分钟后(平均用例),该程序终止了其输出文件和一些.png图文件.
- 服务器显示带有图形和一些摘要文本的网页
我认为不会立即有数百或数千人使用它.但是,由于进行的计算需要大量的RAM和处理器功率(每个实例分叉使用Python的Pool>).
).我想知道您是否知道使用排队系统是否值得麻烦.我遇到了一个名为beanstalkc的Python模块,但是在页面上说这是一个"内存"的排队系统.
在这种情况下,"内存"意味着什么?我担心内存,而不仅仅是CPU时间,因此我想确保一次只能运行(或在RAM中,无论是否收到CPU时间).
另外,我试图决定
是否- 结果页(CGI服务)应告诉您其在队列中的位置(直到运行并显示实际结果页面)
或
- 用户应将其电子邮件地址提交给CGI,该地址将在完成后将其链接发送到结果页面.
您认为对于此类问题的轻型流量CGI的适当设计方法是什么?非常感谢建议.
推荐答案
绝对使用芹菜.您可以运行AMQP服务器,或者我认为您可以起诉数据库作为消息队列.它允许您在后台运行任务,并且可以使用多个工具机进行处理.如果您使用 django-celery
在后台运行任务很简单:
@task def add(x, y): return x + y
在一个项目中,我已经在四台机器上分发了工作,并且它起作用伟大.
问题描述
I'm working on a fairly simple CGI with Python. I'm about to put it into Django, etc. The overall setup is pretty standard server side (i.e. computation is done on the server):
- User uploads data files and clicks "Run" button
- Server forks jobs in parallel behind the scenes, using lots of RAM and processor power. ~5-10 minutes later (average use case), the program terminates, having created a file of its output and some .png figure files.
- Server displays web page with figures and some summary text
I don't think there are going to be hundreds or thousands of people using this at once; however, because the computation going on takes a fair amount of RAM and processor power (each instance forks the most CPU-intensive task using Python's Pool).
I wondered if you know whether it would be worth the trouble to use a queueing system. I came across a Python module called beanstalkc, but on the page it said it was an "in-memory" queueing system.
What does "in-memory" mean in this context? I worry about memory, not just CPU time, and so I want to ensure that only one job runs (or is held in RAM, whether it receives CPU time or not) at a time.
Also, I was trying to decide whether
- the result page (served by the CGI) should tell you it's position in the queue (until it runs and then displays the actual results page)
OR
- the user should submit their email address to the CGI, which will email them the link to the results page when it is complete.
What do you think is the appropriate design methodology for a light traffic CGI for a problem of this sort? Advice is much appreciated.
推荐答案
Definitely use celery. You can run an amqp server or I think you can sue the database as a queue for the messages. It allows you to run tasks in the background and it can use multiple worker machines to do the processing if you want. It can also do cron jobs that are database based if you use django-celery
It's as simple as this to run a task in the background:
@task def add(x, y): return x + y
In a project I have it's distributing the work over 4 machines and it works great.