问题描述
我已经在这里拖了网络以寻求答案,我无法工作,所以也许有人有新的视角.
- 我试图从Apache Spark内部写入登录到Kafka主题2.2 应用.
- 因为Spark仍然使用Log4J V1,所以我必须尝试获取 v1 kafka appender工作,而不是能够使用 默认的kafka appender提供了log4j v2.
-
我可以在通过Intellij运行的一个小演示应用程序中使用以下库(来自build.sbt):
//v1 log4j appender所需的kafka的旧版本 library Depentencies +=" org.apache.kafka" %%" kafka"%" 0.8.2.2"
-
,但我找不到通过例如火花壳或火花床.
-
我可以在Spark的Log4j.properties中配置与我的虚拟应用中相同的设置.
-
但是,当火花壳启动时,似乎在加载任何额外的罐子之前就启动了记录器,然后立即丢下错误,因为它找不到Kafka Appender:
log4j:错误无法实例化类[kafka.producer.kafkalog4jappender]. java.lang.classnotfoundexception:kafka.producer.kafkalog4jappender
-
我在Spark配置文件或CLI中尝试了各种选项,以使JARS首先加载,例如-jars, - 文件, - driver-class-path,设置spark.driver.extraclasspath和spark.executor.executor.extraclasspath in spark-default.conf等
似乎没什么可用,所以任何人都可以用它起作用,即火花2.2.通过log4j登录kafka,如果是,他们可以建议使用正确的配置选项以允许我执行此操作吗?
顺便说一句,这里有几个类似的问题,但是它们都没有为我解决问题,所以请不要将其标记为重复.
感谢您提供的任何提示!
推荐答案
kafka-log4j-appender with spark
我设法在cluster模式下使用kafka-log4j-appender 2.3.0使用spark-submit 2.1.1,但我相信其他版本的行为会类似.
准备
首先,我认为阅读日志确实很有帮助,因此您需要能够阅读应用程序纱线日志和spark-submit信息. 有时,当应用程序悬挂在ACCEPT阶段(由于Kafka生产者Missconfiguration)时,有必要从Hadoop Yarn应用程序概述中读取日志.
所以每当我启动应用程序时,抓取
都非常重要19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)
行并下载纱线后的所有日志
yarn logs -applicationId application_1564028288963_2380
好吧,让我们尝试!
提供kafka-log4j-appender Spark
基本上,spark缺少kafka-log4j-appender.
通常,您应该能够在胖罐中提供kafka-log4j-appender.我以前曾经在类似的问题它不起作用的地方.仅仅因为在集群环境中,您的类路径被Spark覆盖.因此,如果它也不适合您,请继续.
选项A.手动下载罐子:
kafka-log4j-appender-2.3.0.jar
kafka-clients-2.3.0.jar
您实际上需要两者,因为没有客户,Appender将无法工作.
将它们放在您开火spark-submit的同一台机器上.
好处是,您可以按照自己的意愿命名它们.
现在client模式
JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar' JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar' JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar' spark-submit \ --deploy-mode client \ --jars "$JARS" --conf "spark.driver.extraClassPath=$JARS_CLP" \ --conf "spark.executor.extraClassPath=$JARS_NAMES" \
或用于cluster模式
spark-submit \ --deploy-mode cluster \ --jars "$JARS" --conf "spark.driver.extraClassPath=$JARS_NAMES" \ --conf "spark.executor.extraClassPath=$JARS_NAMES" \
选项B.使用--packages从Maven下载罐子:
我认为这更方便,但是您必须精确地获得名称.
您需要在运行过程中寻找这类行:
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar 19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar
并记下jars在application_1569430771458_10776 hdfs上的jars中如何调用.
.现在client模式
JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar' KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar' spark-submit \ --deploy-mode client \ --packages "org.apache.kafka:kafka-log4j-appender:2.3.0" --conf "spark.driver.extraClassPath=$JARS_CLP" \ --conf "spark.executor.extraClassPath=$KAFKA_JARS" \
或用于cluster模式
spark-submit \ --deploy-mode cluster \ --packages "org.apache.kafka:kafka-log4j-appender:2.3.0" --conf "spark.driver.extraClassPath=$KAFKA_JARS" \ --conf "spark.executor.extraClassPath=$KAFKA_JARS" \
以上应该已经有效
额外的步骤
如果您想提供logging.proprietes,请在此处遵循我的教程: https://stackover.com/a/55596389/1549135
问题描述
I've already trawled the web for answers here, and I cannot get anything to work, so maybe somebody has a fresh perspective.
- I'm trying to write logs to a Kafka topic from inside an Apache Spark 2.2 application.
- Because Spark still uses Log4j v1, I have to try and get the v1 Kafka appender to work, instead of being able to use the default Kafka appender provided with Log4j v2.
I can do this in a little demo app running via IntelliJ, using the following library (from build.sbt):
// Old version of Kafka needed for v1 Log4j appender libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.2"
But I cannot find a way to get this to run via e.g. spark-shell or spark-submit.
I can configure the appender in Spark's log4j.properties using the same settings as in my dummy app.
But when the Spark shell starts up, it seems it fires up the logger before it loads any extra JARs, then throws an error immediately because it can't find the Kafka appender:
log4j:ERROR Could not instantiate class [kafka.producer.KafkaLog4jAppender]. java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
I have tried all kinds of options, in the Spark config files or on the CLI, to get the JARs to load up first e.g. --jars, --files, --driver-class-path, setting spark.driver.extraClassPath and spark.executor.extraClassPath in spark-default.conf, etc etc.
Nothing seems to work, so has anybody ever got this to work i.e. Spark 2.2. logging to Kafka via Log4j, and if so, can they suggest the right config options to allow me to do this?
By the way, there are several similar questions here on SO, but none of them has solved the problem for me, so please don't mark this as a duplicate.
Thanks for any tips you can offer!
推荐答案
kafka-log4j-appender with Spark
I managed to use spark-submit 2.1.1 in cluster mode with kafka-log4j-appender 2.3.0, but I believe other versions will behave similarly.
Preparation
First of all, I think it is really helpful to read the logs so you need to be able to read both application yarn logs and spark-submit informations. Sometimes when the application hanged in ACCEPT phase (because of kafka producer missconfiguration) it was necessary to read the logs from the Hadoop Yarn application overview.
So whenever I was starting my app, it was very important to grab
19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)
line and download all the logs from YARN when it was completed
yarn logs -applicationId application_1564028288963_2380
Ok, lets try!
Provide kafka-log4j-appender for Spark
Basically, spark is missing kafka-log4j-appender.
Generally, you should be able to provide kafka-log4j-appender in your fat jar. I had some previous experience with similar problem where it does not work. Simply because in a cluster environment your classpath is overridden by Spark. So if it does not work for you either, move on.
Option A. Manually download jars:
kafka-log4j-appender-2.3.0.jar kafka-clients-2.3.0.jar
You actually need both, because appender won't work without clients.
Place them on the same machine you fire spark-submit from.
The benefit is, that you can name them as you like.
Now for client mode
JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar' JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar' JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar' spark-submit \ --deploy-mode client \ --jars "$JARS" --conf "spark.driver.extraClassPath=$JARS_CLP" \ --conf "spark.executor.extraClassPath=$JARS_NAMES" \
Or for cluster mode
spark-submit \ --deploy-mode cluster \ --jars "$JARS" --conf "spark.driver.extraClassPath=$JARS_NAMES" \ --conf "spark.executor.extraClassPath=$JARS_NAMES" \
Option B. Use --packages to download jars from maven:
I think this is more convenient, but you have to get the name precisely.
You need to look for those kinds of lines during run:
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar 19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar
and note down how the jars are called inside application_1569430771458_10776 folder on hdfs.
Now for client mode
JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar' KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar' spark-submit \ --deploy-mode client \ --packages "org.apache.kafka:kafka-log4j-appender:2.3.0" --conf "spark.driver.extraClassPath=$JARS_CLP" \ --conf "spark.executor.extraClassPath=$KAFKA_JARS" \
Or for cluster mode
spark-submit \ --deploy-mode cluster \ --packages "org.apache.kafka:kafka-log4j-appender:2.3.0" --conf "spark.driver.extraClassPath=$KAFKA_JARS" \ --conf "spark.executor.extraClassPath=$KAFKA_JARS" \
The above should work already
Extra steps
If you want to provide your logging.proprietes, follow my tutorial on that here: https://stackoverflow.com/a/55596389/1549135