问题描述
我想创建一个自定义记录仪,该记录器从群集节点中特定文件夹中的执行者中写下消息.我已经编辑了我的log4j.properties文件中的spark_home/conf/喜欢:
log4j.rootLogger=${root.logger} root.logger=WARN,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n shell.log.level=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.spark-project.jetty=WARN log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR log4j.logger.org.apache.spark.repl.Main=${shell.log.level} log4j.logger.org.apache.spark.api.python.PythonGatewayServer=${shell.log.level} #My logger to write usefull messages in a local file log4j.logger.jobLogger=INFO, RollingAppenderU log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender log4j.appender.RollingAppenderU.File=/var/log/sparkU.log log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n log4j.appender.fileAppender.MaxFileSize=1MB log4j.appender.fileAppender.MaxBackupIndex=1
我想使用Joblogger保存在/var/log/sparku.log中. 我在Python中创建了一个小程序,该程序打印了一些特定消息:
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext, SparkSession from pyspark.sql.types import * spark = SparkSession \ .builder \ .master("yarn") \ .appName("test custom logging") \ .config("spark.some.config.option", "some-value") \ .getOrCreate() log4jLogger = spark.sparkContext._jvm.org.apache.log4j log = log4jLogger.LogManager.getLogger("jobLogger") log.info("Info message") log.warn("Warn message") log.error("Error message")
我这样提交:
/usr/bin/spark-submit --master yarn --deploy-mode client /mypath/test_log.py
当我使用部署模式客户端时,文件是在所需的位置写的.当我使用部署模式群集时,未编写本地文件,但可以在纱线日志中找到消息.但是在这两种模式的纱线日志中,我也都采用此错误(从纱线日志中输出Spark群集模式的输出):
log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /var/log/sparkU.log (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at java.io.FileOutputStream.<init>(FileOutputStream.java:142) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102) at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:746) at org.apache.spark.internal.Logging$class.log(Logging.scala:46) at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:746) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:761) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) log4j:ERROR Either File or DatePattern options are not set for appender [RollingAppenderU]. 18/01/15 12:13:00 WARN spark.SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 18/01/15 12:13:02 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 18/01/15 12:13:04 INFO jobLogger: Info message 18/01/15 12:13:04 WARN jobLogger: Warn message 18/01/15 12:13:04 ERROR jobLogger: Error message
所以我有两个问题:
- 为什么要打印第一个错误消息(java.io.filenotfoundexception)?我怀疑这来自应用程序的主记录器,但是如何阻止它打印此错误?我只希望执行者使用文件记录器.
- 可以使用群集模式,并且仍然可以在其中一台计算机中的特定文件上写入吗?我想知道我是否可以以某种方式输入诸如主机之类的路径:port/mypath/spark.log,所有执行者都会在其中一台计算机中写入该文件. 预先感谢您的任何回复.
推荐答案
我能够使用自定义记录仪以集群模式在本地文件中附加.
首先,在所有群集工作程序节点中,我在同一目录中提供了log4j文件(例如/home/myuser/log4j.custom.properties),并在同一节点中创建了一个文件夹,以保存我的用户中的日志路径(例如/Home/Myuser/Sparklogs).
之后,在提交中,我将该文件作为带有驱动程序 - java-options的驱动程序记录器传递,这可以解决问题.我使用此提交(Log4j文件与以前相同):
/usr/bin/spark2-submit --driver-java-options "-Dlog4j.configuration=file:///home/myUser/log4j.custom.properties" --master yarn --deploy-mode client --driver-memory nG --executor-memory nG --executor-cores n /home/myUser/sparkScripts/myCode.py
问题描述
I want to create a custom logger that writes from messages from executors in a specific folder in a cluster node. I have edited my log4j.properties file in SPARK_HOME/conf/ like this:
log4j.rootLogger=${root.logger} root.logger=WARN,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n shell.log.level=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.spark-project.jetty=WARN log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR log4j.logger.org.apache.spark.repl.Main=${shell.log.level} log4j.logger.org.apache.spark.api.python.PythonGatewayServer=${shell.log.level} #My logger to write usefull messages in a local file log4j.logger.jobLogger=INFO, RollingAppenderU log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender log4j.appender.RollingAppenderU.File=/var/log/sparkU.log log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n log4j.appender.fileAppender.MaxFileSize=1MB log4j.appender.fileAppender.MaxBackupIndex=1
I want using the jobLogger to save a file in /var/log/sparkU.log. I created a small program in Python that prints some specific messages:
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext, SparkSession from pyspark.sql.types import * spark = SparkSession \ .builder \ .master("yarn") \ .appName("test custom logging") \ .config("spark.some.config.option", "some-value") \ .getOrCreate() log4jLogger = spark.sparkContext._jvm.org.apache.log4j log = log4jLogger.LogManager.getLogger("jobLogger") log.info("Info message") log.warn("Warn message") log.error("Error message")
and I submit it like this:
/usr/bin/spark-submit --master yarn --deploy-mode client /mypath/test_log.py
When I use deploy mode client the file is written at the desired place. When I use deploy mode cluster the local file is not written but the messages can be found in YARN log. But in YARN logs for both modes I take this error also (output for spark cluster mode from YARN logs) :
log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /var/log/sparkU.log (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at java.io.FileOutputStream.<init>(FileOutputStream.java:142) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102) at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:746) at org.apache.spark.internal.Logging$class.log(Logging.scala:46) at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:746) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:761) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) log4j:ERROR Either File or DatePattern options are not set for appender [RollingAppenderU]. 18/01/15 12:13:00 WARN spark.SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 18/01/15 12:13:02 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 18/01/15 12:13:04 INFO jobLogger: Info message 18/01/15 12:13:04 WARN jobLogger: Warn message 18/01/15 12:13:04 ERROR jobLogger: Error message
So I have two questions:
-Why the first error message is printed (java.io.FileNotFoundException)? I suspect that this come from the application's master logger but how can I stop it from printing this error? I want only executors to use the file logger.
-Is it possible to use cluster mode and still be able to write at a specific file in one of the machines? I was wondering If I can somehow enter a path like host:port/myPath/spark.log and all the executors would write in that file in one of the machines. Thanks in advance for any response.
推荐答案
I was able to use a custom logger to append in a local file in Yarn in cluster mode.
First of all in all cluster worker nodes I made available the log4j file in the same directory (e.g. /home/myUser/log4j.custom.properties ) and also created a folder in the same nodes to save the logs in my user path (e.g. /home/myUser/sparkLogs ).
After that, in submit, I pass that file as the driver logger with driver-java-options and this does the trick. I use this submit (the log4j file is the same as before):
/usr/bin/spark2-submit --driver-java-options "-Dlog4j.configuration=file:///home/myUser/log4j.custom.properties" --master yarn --deploy-mode client --driver-memory nG --executor-memory nG --executor-cores n /home/myUser/sparkScripts/myCode.py