hadoop与teradata的区别是什么?[英] hadoop vs teradata what is the difference

本文是小编为大家收集整理的关于hadoop与teradata的区别是什么?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我已经触摸了Teradata.我从未触过Hadoop,但是从昨天开始,我就对此进行了一些研究.通过对两者的描述,它们似乎是可以互换的,但是在某些论文中,它们用于不同的目的.但是我发现的只是模糊的.我很困惑.

有没有人经历过他们俩?它们之间有什么严重区别?

简单示例:我想构建ETL,该ETL将转换数十亿行的原始数据并将其组织为DWH.然后对它们进行一些资源昂贵的分析.为什么要使用TD?为什么哈德普?或为什么不呢?

推荐答案

我想 >标题为" MapReduce和Parallel DBMSS:朋友或敌人"做得很好,描述了每种技术效果最佳的情况.简而言之,Hadoop非常适合存储非结构化数据和运行并行转换以"消毒"传入数据,其中DBMSS在快速执行复杂查询方面表现出色.

其他推荐答案

hadoop,带有扩展的hadoop,rdbms功能/属性比较

我不是该领域的专家,但是在Coursera.com课程,数据科学简介中,有一个题为:比较MapReduce和数据库的演讲,以及在地图中的平行数据库中进行的演讲.课程.

这是来自这些讲座的摘要,内容涉及MAPREDUCE与RDBMS的比较(不一定是平行的RDMB). 要记住的一点是,如果您在Hadoop中包含扩展名,例如Pig,Hive等.

RDBM具有但没有本机MapReduce的一些功能/属性:

  • 声明性查询语言 - (猪,蜂巢)
  • 架构(蜂巢,猪,dyradlinq,hadapt)
  • 逻辑数据独立性
  • 索引(HBase)
  • 代数优化(Pig,Dryad,Hive)
  • 缓存/实现视图
  • 酸/交易

mapReduce(相对于常规rdbms不一定是平行的RDMB)

  • 高可伸缩性
  • 耐故障
  • "单人部署"

其他推荐答案

我已经多次问过这个问题,我通常给出的答案是一个类比(这很愚蠢,因为我不是汽车人 - 但它似乎有效)

  • Teradata是群众的汽车/DBM-它是可靠的,成熟的,效果很好,并且在需要时就在那里.很难(与Hadoop相比)自定义和添加功能.
  • Hadoop是发烧友的汽车/DBM-它不那么可靠或成熟,只要您参加它,它就会效果很好.很容易(与Teradata相比)自定义和添加功能.

换句话说,Teradata是可靠的主力,您将任务关键流程(操作报告,企业报告,决策支持等)进行了关键过程. Hadoop是您可以做很多这些事情的地方,但是如果您在一个早晨来并发现您的监管报告无法产生,因为有人使用补丁,或者您突然有一个"也",请不要感到惊讶许多小文件"问题".

要回到类比中,如果您不想变得太技术,制造商的产品(DBMS和/或CAR)可以开箱即用,Teradata是一个不错的选择. 另一方面,如果您想修补引擎盖下,换掉化油器(或其他),调整齿轮比,根据您是乡村还是城市驾驶,调整燃油空气混合物,在涡轮充电器上螺栓和//或者您的家人抱怨您在周末在车库里花了多长时间 - Hadoop是您的地方.

IMHO,大多数(如果不是所有的组织)都需要两者. 我希望这会有所帮助: - )

本文地址:https://www.itbaoku.cn/post/597763.html

问题描述

I've touched a Teradata. I've never touched hadoop, but since yesterday, I am doing some research on that. By description of both, they seem quite interchangable, but in some papers it is written that they serve for different purposes. But all I found is vague. I am confused.

Has anybody experience with both of them? What is the serious difference between them?

Simple Example: I want to build ETL which will transform billions rows of raw data and organize them to DWH. Then do some resources expensive analysis on them. Why use TD? Why Hadoop? or why not?

推荐答案

I think this article titled 'MapReduce and Parallel DBMSs: Friends or Foes' does quite a good job describing the situations where each technology works best. In a nutshell, Hadoop is excellent for storing unstructured data and running parallel transformations to 'sanitize' incoming data, where DBMSs excel at executing complex queries quickly.

其他推荐答案

Hadoop, Hadoop with Extensions, RDBMS Feature/Property Comparison

I am not an expert in this area, but in the coursera.com course, Introduction to Data Science, there is a lecture titled: Comparing MapReduce and Databases as well as a lecture on Parallel databases within the map reduce section of the course.

Here is a summary from these lectures on the comparison of MapReduce vs. RDBMS (not necessarily parallel RDMBS). One point to remember is that the comparison is different if you include extensions to Hadoop like PIG, Hive, etc. I will put in () MapReduce extensions that add some of these functionality/properties.

Some functionality/properties that RDBMS have but not native MapReduce:

  • Declaritive query languages -(Pig, HIVE)
  • Schemas (Hive, Pig, DyradLINQ, Hadapt)
  • Logical Data Independence
  • Indexing (Hbase)
  • Algebraic Optimization (Pig, Dryad, HIVE)
  • Caching/Materialized Views
  • ACID/Transactions

MapReduce (relative to regular RDBMS not necessarily Parallel RDMBS)

  • High Scalability
  • Fault-tolerance
  • “One-person deployment”

其他推荐答案

I've been asked this question several times, the answer that I usually give is a car analogy (which is pretty silly because I'm not a car person - but it seems to work)

  • Teradata is the car/dbms for the masses - it is reliable, mature, works well and is there when you need it. It is difficult (compared to Hadoop) to customise and add functionality to the base product.
  • Hadoop is the car/dbms for the enthusiast - it isn't as reliable or mature, it works well so long as you attend to it. It is easy (compared to Teradata) to customise and add functionality to the base product.

Put another way, Teradata is the reliable workhorse where you put your mission critical process (operational reporting, enterprise reporting, decision support etc). Hadoop is the place where you can do alot of this stuff, but don't be surprised if you come in one morning and find that your regulatory reports can't be produced because someone applied a patch or you've suddenly got a "too many small files" problem.

To loop back into the analogy, if you don't want to be too techy and the manufacturers product (dbms and/or car) works for you out of the box, Teradata is a good option. On the other hand, if you like to tinker under the hood, swap out the carburettor (or whatever), adjust the gear ratios, tweak the fuel air mixture depending on whether you are country or city driving, bolt on a Turbo charger and/or your family complain about how long you spend in the garage on weekends - Hadoop is the place for you.

IMHO, Most, if not all organisations need both. I hope this helps :-)