将apache-spark日志发送到亚马逊EMR集群的redis/logstash的最佳方式[英] Best way to send apache-spark loggin to redis/logstash on an Amazon EMR cluster

本文是小编为大家收集整理的关于将apache-spark日志发送到亚马逊EMR集群的redis/logstash的最佳方式的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

i在亚马逊EMR群集上发动了填充作业.我希望所有火花记录都将发送到Redis/logstash.在EMR下配置Spark的正确方法是什么?

  • 保持log4j:添加一个bootstrap操作以修改/home/hadoop/spark/conf/conf/log4j.properties添加一个appender?但是,此文件已经包含很多内容,并且是Hadoop Conf File的符号链接.我不想太多,因为它已经包含了一些rootlogger.哪个Appender会做得最好? ryantenney/log4j-redis-appender + logstash/logstash/log4j-jsonevent-layout或pavlobaron/log4j2redis?

  • 迁移到slf4j+logback:从spark-core中排除slf4j-log4j12,添加log4j-over-slf4j ...并使用com.cwbase.logback.redisappender使用logback.xml?依赖关系看起来这将是有问题的.它会隐藏log4j.rootloggers已在log4j.properties中定义吗?

  • 我错过的其他任何东西?

您对此有何看法?

更新

看来我无法获得第二个工作.运行测试很好,但是使用spark-submit(with-conf spark.driver.userclasspathfirst = true)始终会遇到可怕的"检测到log4j-over-over-slf4j.jar.jar和slf4j-log4j12.jar of Class Path上,抢占stackoverflowerror."

推荐答案

我会在集群上为此设置一个额外的守护程序.

本文地址:https://www.itbaoku.cn/post/1574921.html

问题描述

I spark-submit jobs on an Amazon EMR cluster. I'd like all spark logging to be sent to redis/logstash. What is the proper way to configure spark under EMR to do this?

  • Keep log4j: Add a bootstrap action to modify /home/hadoop/spark/conf/log4j.properties to add an appender? However, this file already contains a lot of stuff and is a symlink to hadoop conf file. I don't want to fiddle too much with that as it already contains some rootLoggers. Which appender would do best? ryantenney/log4j-redis-appender + logstash/log4j-jsonevent-layout OR pavlobaron/log4j2redis ?

  • Migrate to slf4j+logback: Exclude slf4j-log4j12 from spark-core, add log4j-over-slf4j ... and use a logback.xml with a com.cwbase.logback.RedisAppender? Looks like this will be problematic with dependencies. Will it hide log4j.rootLoggers already defined in log4j.properties?

  • Anything else I missed?

What are your thoughts on this?

Update

Looks like I can't get second option to work. Running tests is just fine but using spark-submit (with --conf spark.driver.userClassPathFirst=true) always end up with the dreaded "Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError."

推荐答案

I would setup an extra daemon for that on the cluster.