用Log4j v1 Kafka应用程序将Spark日志记录到Kafka上的APAche。[英] APAche Spark logging to Kafka with Log4j v1 Kafka appender

本文是小编为大家收集整理的关于用Log4j v1 Kafka应用程序将Spark日志记录到Kafka上的APAche。的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我已经在这里拖了网络以寻求答案,我无法工作,所以也许有人有新的视角.

  • 我试图从Apache Spark内部写入登录到Kafka主题2.2 应用.
  • 因为Spark仍然使用Log4J V1,所以我必须尝试获取 v1 kafka appender工作,而不是能够使用 默认的kafka appender提供了log4j v2.
  • 我可以在通过Intellij运行的一个小演示应用程序中使用以下库(来自build.sbt):

    //v1 log4j appender所需的kafka的旧版本 library Depentencies +=" org.apache.kafka" %%" kafka"%" 0.8.2.2"

  • ,但我找不到通过例如火花壳或火花床.

  • 我可以在Spark的Log4j.properties中配置与我的虚拟应用中相同的设置.

  • 但是,当火花壳启动时,似乎在加载任何额外的罐子之前就启动了记录器,然后立即丢下错误,因为它找不到Kafka Appender:

    log4j:错误无法实例化类[kafka.producer.kafkalog4jappender]. java.lang.classnotfoundexception:kafka.producer.kafkalog4jappender

  • 我在Spark配置文件或CLI中尝试了各种选项,以使JARS首先加载,例如-jars, - 文件, - driver-class-path,设置spark.driver.extraclasspath和spark.executor.executor.extraclasspath in spark-default.conf等

似乎没什么可用,所以任何人都可以用它起作用,即火花2.2.通过log4j登录kafka,如果是,他们可以建议使用正确的配置选项以允许我执行此操作吗?

顺便说一句,这里有几个类似的问题,但是它们都没有为我解决问题,所以请不要将其标记为重复.

感谢您提供的任何提示!

推荐答案

kafka-log4j-appender with spark

我设法在cluster模式下使用kafka-log4j-appender 2.3.0使用spark-submit 2.1.1,但我相信其他版本的行为会类似.


准备

首先,我认为阅读日志确实很有帮助,因此您需要能够阅读应用程序纱线日志和spark-submit信息. 有时,当应用程序悬挂在ACCEPT阶段(由于Kafka生产者Missconfiguration)时,有必要从Hadoop Yarn应用程序概述中读取日志.

所以每当我启动应用程序时,抓取

都非常重要
19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)

行并下载纱线后的所有日志

yarn logs -applicationId application_1564028288963_2380

好吧,让我们尝试!


提供kafka-log4j-appender Spark

基本上,spark缺少kafka-log4j-appender.

通常,您应该能够在胖罐中提供kafka-log4j-appender.我以前曾经在类似的问题它不起作用的地方.仅仅因为在集群环境中,您的类路径被Spark覆盖.因此,如果它也不适合您,请继续.

选项A.手动下载罐子:

kafka-log4j-appender-2.3.0.jar
kafka-clients-2.3.0.jar

您实际上需要两者,因为没有客户,Appender将无法工作.
将它们放在您开火spark-submit的同一台机器上.
好处是,您可以按照自己的意愿命名它们.

现在client模式

JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar'
JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar'
JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

或用于cluster模式

spark-submit \
    --deploy-mode cluster \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_NAMES" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

选项B.使用--packages从Maven下载罐子:

我认为这更方便,但是您必须精确地获得名称.

您需要在运行过程中寻找这类行:

19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar

并记下jars在application_1569430771458_10776 hdfs上的jars中如何调用.

.

现在client模式

JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar'
KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

或用于cluster模式

spark-submit \
    --deploy-mode cluster \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$KAFKA_JARS" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

以上应该已经有效

额外的步骤

如果您想提供logging.proprietes,请在此处遵循我的教程: https://stackover.com/a/55596389/1549135

本文地址:https://www.itbaoku.cn/post/1575131.html

问题描述

I've already trawled the web for answers here, and I cannot get anything to work, so maybe somebody has a fresh perspective.

  • I'm trying to write logs to a Kafka topic from inside an Apache Spark 2.2 application.
  • Because Spark still uses Log4j v1, I have to try and get the v1 Kafka appender to work, instead of being able to use the default Kafka appender provided with Log4j v2.
  • I can do this in a little demo app running via IntelliJ, using the following library (from build.sbt):

    // Old version of Kafka needed for v1 Log4j appender libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.2"

  • But I cannot find a way to get this to run via e.g. spark-shell or spark-submit.

  • I can configure the appender in Spark's log4j.properties using the same settings as in my dummy app.

  • But when the Spark shell starts up, it seems it fires up the logger before it loads any extra JARs, then throws an error immediately because it can't find the Kafka appender:

    log4j:ERROR Could not instantiate class [kafka.producer.KafkaLog4jAppender]. java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender

  • I have tried all kinds of options, in the Spark config files or on the CLI, to get the JARs to load up first e.g. --jars, --files, --driver-class-path, setting spark.driver.extraClassPath and spark.executor.extraClassPath in spark-default.conf, etc etc.

Nothing seems to work, so has anybody ever got this to work i.e. Spark 2.2. logging to Kafka via Log4j, and if so, can they suggest the right config options to allow me to do this?

By the way, there are several similar questions here on SO, but none of them has solved the problem for me, so please don't mark this as a duplicate.

Thanks for any tips you can offer!

推荐答案

kafka-log4j-appender with Spark

I managed to use spark-submit 2.1.1 in cluster mode with kafka-log4j-appender 2.3.0, but I believe other versions will behave similarly.


Preparation

First of all, I think it is really helpful to read the logs so you need to be able to read both application yarn logs and spark-submit informations. Sometimes when the application hanged in ACCEPT phase (because of kafka producer missconfiguration) it was necessary to read the logs from the Hadoop Yarn application overview.

So whenever I was starting my app, it was very important to grab

19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)

line and download all the logs from YARN when it was completed

yarn logs -applicationId application_1564028288963_2380

Ok, lets try!


Provide kafka-log4j-appender for Spark

Basically, spark is missing kafka-log4j-appender.

Generally, you should be able to provide kafka-log4j-appender in your fat jar. I had some previous experience with similar problem where it does not work. Simply because in a cluster environment your classpath is overridden by Spark. So if it does not work for you either, move on.

Option A. Manually download jars:

kafka-log4j-appender-2.3.0.jar
kafka-clients-2.3.0.jar

You actually need both, because appender won't work without clients.
Place them on the same machine you fire spark-submit from.
The benefit is, that you can name them as you like.

Now for client mode

JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar'
JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar'
JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

Or for cluster mode

spark-submit \
    --deploy-mode cluster \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_NAMES" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

Option B. Use --packages to download jars from maven:

I think this is more convenient, but you have to get the name precisely.

You need to look for those kinds of lines during run:

19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar

and note down how the jars are called inside application_1569430771458_10776 folder on hdfs.

Now for client mode

JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar'
KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

Or for cluster mode

spark-submit \
    --deploy-mode cluster \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$KAFKA_JARS" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

The above should work already

Extra steps

If you want to provide your logging.proprietes, follow my tutorial on that here: https://stackoverflow.com/a/55596389/1549135