英文:
Configuring Apache Spark Logging with Maven and logback and finally throwing message to Loggly
问题
我在尝试让我的Spark应用程序忽略Log4j,以便使用Logback时遇到了问题。我尝试使用Logback主要是因为它支持loggly appender。
我在我的pom文件中添加了以下依赖项和排除项(版本信息位于主pom库的依赖管理器中):
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
</dependency>
<dependency>
<groupId>org.logback-extensions</groupId>
<artifactId>logback-ext-loggly</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</dependency>
我参考了以下两篇文章:
在Logback中将应用程序日志与Spark日志分离
使用Scala和logback配置Apache Spark日志
我尝试首先使用以下参数(在运行spark-submit时):
--conf "spark.driver.userClassPathFirst=true"
--conf "spark.executor.userClassPathFirst=true"
但是收到了错误消息:
Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature
我希望能够使其与上述内容一起正常工作,但我也尝试了以下方法:
--conf "spark.driver.extraClassPath=$libs"
--conf "spark.executor.extraClassPath=$libs"
但是,由于我在本地和(Amazon EMR集群上)将我的uber jar传递给spark submit,我真的不能指定一个对我机器本地有效的库文件位置。由于uber jar 包含这些文件,是否有办法使用这些文件?当Spark应用程序最终在EMR集群的主节点/节点上运行时,我是否被迫将这些库文件复制到那里?第一种关于使用userClassPathFirst的方法似乎是最佳途径。
英文:
I'm having trouble getting my Spark Application to ignore Log4j, in order to use Logback. One of the reasons i'm trying to use logback, is for the loggly appender it supports.
I have the following dependencies and exclusions in my pom file. (versions are in my dependency manager in main pom library.)
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
</dependency>
<dependency>
<groupId>org.logback-extensions</groupId>
<artifactId>logback-ext-loggly</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</dependency>
I have referenced these two articles:
Separating application logs in Logback from Spark Logs in log4j<br/>
Configuring Apache Spark Logging with Scala and logback
I've tried using first using (when running spark-submit) :<br/>
--conf "spark.driver.userClassPathFirst=true" <br/>
--conf "spark.executor.userClassPathFirst=true"
but receive the error
Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.ge
tLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current cl
ass, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4
j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature
I would like to get it working with the above, but then i also looked at trying the below<br/>
--conf "spark.driver.extraClassPath=$libs" <br/>
--conf "spark.executor.extraClassPath=$libs"
but since i'm passing my uber jar to spark submit locally AND (on a Amazon EMR cluster) i really can't be specifying a library file location that will be local to my machine. Since the uber jar contains the files, is there a way for it to use those files? Am i forced to copy these libraries to the master/nodes on the EMR cluster when the spark app finally runs from there?
The first approach about using the userClassPathFirst seems like the best route though.
答案1
得分: 1
以下是翻译好的内容:
解决了这个问题,当时遇到了几个问题。
为了让 Spark 允许 logback 正常工作,对我有效的解决方案是结合我在上面发布的文章中的一些内容,以及一个证书文件的问题。
我用于传递给 spark-submit 的证书文件是不完整的,并且覆盖了基本的信任库证书。这导致发送 Https 消息到 Loggly 时出现了问题。
第一部分更改:
将 Maven 更新为阴影 org.slf4j(正如 @matemaciek 的答案中所述)
<dependencies>
...
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>org.logback-extensions</groupId>
<artifactId>logback-ext-loggly</artifactId>
<version>0.1.5</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.TestClass</Main-Class>
</manifestEntries>
</transformer>
</transformers>
<relocations>
<relocation>
<pattern>org.slf4j</pattern>
<shadedPattern>com.shaded.slf4j</shadedPattern>
</relocation>
</relocations>
</configuration>
</plugin>
</plugins>
</build>
第一部分附加:logback.xml
<configuration debug="true">
<appender name="logglyAppender" class="ch.qos.logback.ext.loggly.LogglyAppender">
<endpointUrl>https://logs-01.loggly.com/bulk/TOKEN/tag/TAGS/</endpointUrl>
<pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
</appender>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
</encoder>
</appender>
<root level="info">
<appender-ref ref="logglyAppender" />
<appender-ref ref="STDOUT" />
</root>
</configuration>
第二部分更改:MainClass
import org.slf4j.*;
public class TestClass {
static final Logger log = LoggerFactory.getLogger(TestClass.class);
public static void main(String[] args) throws Exception {
log.info("this is a test message");
}
}
第三部分更改:
我之前是这样提交 Spark 应用的(示例):
spark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar
因此,上述的 spark-submit 在 HTTPS 认证问题上失败了(当时联系 Loggly 发送消息到 Loggly 服务时),因为 rds-truststore.jks 覆盖了没有全部证书的证书。我将其更改为使用 cacerts 存储,现在它有了所需的所有证书。
不再在发送以下消息到 Loggly 时出现错误:
spark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar
(以上翻译的是您提供的内容中的代码部分,不包含额外的回答。如需更多帮助,请随时提问。)
英文:
So I solved the issue and had several problems going on.
So in order to get Spark to allow logback to work, the solution that worked for me was from a combination of items from the articles i posted above, and in addition a cert file problem.
The cert file i was using to pass into spark-submit was incomplete and overriding the base truststore certs. This was causing a problem SENDING Https messages to Loggly.
Part 1 change:
Update maven to shade org.slf4j (as stated in an answer by @matemaciek)
</dependencies>
...
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>org.logback-extensions</groupId>
<artifactId>logback-ext-loggly</artifactId>
<version>0.1.5</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.TestClass</Main-Class>
</manifestEntries>
</transformer>
</transformers>
<relocations>
<relocation>
<pattern>org.slf4j</pattern>
<shadedPattern>com.shaded.slf4j</shadedPattern>
</relocation>
</relocations>
</configuration>
</plugin>
</plugins>
</build>
Part 1a: the logback.xml
<configuration debug="true">
<appender name="logglyAppender" class="ch.qos.logback.ext.loggly.LogglyAppender">
<endpointUrl>https://logs-01.loggly.com/bulk/TOKEN/tag/TAGS/</endpointUrl>
<pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
</appender>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
</encoder>
</appender>
<root level="info">
<appender-ref ref="logglyAppender" />
<appender-ref ref="STDOUT" />
</root>
</configuration>
Part 2 change: The MainClass
import org.slf4j.*;
public class TestClass {
static final Logger log = LoggerFactory.getLogger(TestClass.class);
public static void main(String[] args) throws Exception {
log.info("this is a test message");
}
}
Part 3 change:
i was submitting spark application as such (example):
sparkspark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar
So the above spark-submit failed on a HTTPS certification problem (that was when Loggly was being contacted to send the message to loggly service) because the rds-truststore.jks overwrote the certs without all certs. I changed this to use cacerts store, and it now had all the certs it needed.
No more error at the Loggly part when sending this
sparkspark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar
答案2
得分: 0
你必须在Spark选项中使用以下内容:-Dspark.executor.extraJavaOptions=-Dlogback.configurationFile=/spark/logback/logback.xml
在logback.xml文件中,您应该配置logback的设置。
英文:
You have to usein spark opts -Dspark.executor.extraJavaOptions=-Dlogback.configurationFile=/spark/logback/logback.xml
In logback.xml you should have settings for logback.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论