问题描述
是否有其他范式可以绘制MapReduce(Google,Hadoop)?还有其他合理的方法如何分裂和合并大问题吗?
推荐答案
明确.例如,查看,例如,散装同步.实际上,地图/减少是减少问题的非常有限的方式,但是限制使其在Hadoop这样的框架中可以管理.问题是,将问题压入地图/减少设置或更容易创建特定领域的并行化方案并必须自己照顾所有实施细节是更容易的问题.实际上,Pig只是Hadoop顶部的抽象层,它可以自动化许多标准的问题转换,从不映射降低到兼容地图.
.编辑26.1.13:找到不错概述在这里
其他推荐答案
菲尔·科利拉(Phil Colella)基于散射和收集处理节点之间的数据的模式,确定了七种用于科学计算的数值方法,并将其称为"矮人".这些列表已被其他人添加到矮人矿:
- 密集的线性代数
- 稀疏线性代数
- 光谱方法
- n-body方法
- 结构化网格
- 非结构化网格
- MapReduce
- 组合逻辑
- 图形遍历
- 动态编程
- 回溯和分支和结合
- 图形模型
- 有限状态机
其他推荐答案
更新(2014年8月):平流层现在称为 apache flink (孵化).
看看平流层.这是另一个提供更多运营商的大数据运行时(地图,减少,加入,联合,交叉,迭代,...).它还允许定义高级数据流程图(使用Hadoop MR,您必须链接工作).
平流层还支持BSP的图形处理抽象(称为 spargel "> spargel ).
如果您想阅读科学论文,请看一下网络规模分析处理的执行框架,它解释了系统的理论背景.
该领域的另一个系统是 spark 它具有自己的模型(RDDS).由于此处提到了BSP,也可以查看 GraphLab ,该提供了BSP的替代方案.
问题描述
Are there any alternative paradigms to MapReduce (Google, Hadoop)? Is there any other reasonable way how to split & merge big problems?
推荐答案
Definitively. Check out, for example, Bulk Synchronous Parallel. Map/Reduce is in fact a very restricted way of reducing problems, however that restriction makes it manageable in a framework like Hadoop. The question is if it is less trouble to press your problem into a Map/Reduce setting, or if its easier to create a domain-specific parallelization scheme and having to take care of all the implementation details yourself. Pig, in fact, is only an abstraction layer on top of Hadoop which automates many standard problem transformations from not-Map-Reduce-y to Map-Reduce-compatible.
Edit 26.1.13: Found a nice up-to-date overview here
其他推荐答案
Phil Colella identified seven numerical methods for scientific computation based on the patterns of scattering and gathering of data between processing nodes, and called them 'dwarfs'. These have been added to by others, a list is available at the Dwarf Mine:
- Dense Linear Algebra
- Sparse Linear Algebra
- Spectral Methods
- N-Body Methods
- Structured Grids
- Unstructured Grids
- MapReduce
- Combinational Logic
- Graph Traversal
- Dynamic Programming
- Backtrack and Branch-and-Bound
- Graphical Models
- Finite State Machines
其他推荐答案
Update (August 2014): Stratosphere is now called Apache Flink (incubating).
Have a look at Stratosphere. It is another Big Data runtime that offers more operators (map, reduce, join, union, cross, iterate, ...). It also allows to define advanced data flow graphs (with Hadoop MR, you would have to chain jobs).
Stratosphere also supports BSP with its graph processing abstraction (called Spargel).
If you like to read scientific papers, have a look at Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing, it explains the theoretical backgrounds of the system.
Another system in the field is Spark which has its own model (RDDs). Since BSP has been mentioned here, also have a look at GraphLab, the offer an alternative to BSP.