配置使用Maven和logback进行Apache Spark日志记录,并最终将消息传递到Loggly。

huangapple go评论79阅读模式
英文:

Configuring Apache Spark Logging with Maven and logback and finally throwing message to Loggly

问题

我在尝试让我的Spark应用程序忽略Log4j,以便使用Logback时遇到了问题。我尝试使用Logback主要是因为它支持loggly appender。

我在我的pom文件中添加了以下依赖项和排除项(版本信息位于主pom库的依赖管理器中):

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.12</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
    <exclusions>
        <exclusion>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
        <exclusion>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
        </exclusion>
    </exclusions>            
</dependency>

<dependency>
    <groupId>ch.qos.logback</groupId>
    <artifactId>logback-classic</artifactId>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>ch.qos.logback</groupId>
    <artifactId>logback-core</artifactId>		    
</dependency>

<dependency>
    <groupId>org.logback-extensions</groupId>
    <artifactId>logback-ext-loggly</artifactId>			
</dependency>

<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>log4j-over-slf4j</artifactId>		    
</dependency>

我参考了以下两篇文章:

在Logback中将应用程序日志与Spark日志分离
使用Scala和logback配置Apache Spark日志

我尝试首先使用以下参数(在运行spark-submit时):
--conf "spark.driver.userClassPathFirst=true"
--conf "spark.executor.userClassPathFirst=true"

但是收到了错误消息:

Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature

我希望能够使其与上述内容一起正常工作,但我也尝试了以下方法:
--conf "spark.driver.extraClassPath=$libs"
--conf "spark.executor.extraClassPath=$libs"

但是,由于我在本地和(Amazon EMR集群上)将我的uber jar传递给spark submit,我真的不能指定一个对我机器本地有效的库文件位置。由于uber jar 包含这些文件,是否有办法使用这些文件?当Spark应用程序最终在EMR集群的主节点/节点上运行时,我是否被迫将这些库文件复制到那里?第一种关于使用userClassPathFirst的方法似乎是最佳途径。

英文:

I'm having trouble getting my Spark Application to ignore Log4j, in order to use Logback. One of the reasons i'm trying to use logback, is for the loggly appender it supports.

I have the following dependencies and exclusions in my pom file. (versions are in my dependency manager in main pom library.)

&lt;dependency&gt;
        &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
        &lt;artifactId&gt;spark-core_2.12&lt;/artifactId&gt;
        &lt;version&gt;${spark.version}&lt;/version&gt;
        &lt;scope&gt;provided&lt;/scope&gt;
        &lt;exclusions&gt;
            &lt;exclusion&gt;
                &lt;groupId&gt;org.slf4j&lt;/groupId&gt;
                &lt;artifactId&gt;slf4j-log4j12&lt;/artifactId&gt;
            &lt;/exclusion&gt;
            &lt;exclusion&gt;
                &lt;groupId&gt;log4j&lt;/groupId&gt;
                &lt;artifactId&gt;log4j&lt;/artifactId&gt;
            &lt;/exclusion&gt;
        &lt;/exclusions&gt;            
    &lt;/dependency&gt;
    
	&lt;dependency&gt;
	    &lt;groupId&gt;ch.qos.logback&lt;/groupId&gt;
	    &lt;artifactId&gt;logback-classic&lt;/artifactId&gt;
	    &lt;scope&gt;test&lt;/scope&gt;
	&lt;/dependency&gt;
    
	&lt;dependency&gt;
	    &lt;groupId&gt;ch.qos.logback&lt;/groupId&gt;
	    &lt;artifactId&gt;logback-core&lt;/artifactId&gt;		    
	&lt;/dependency&gt;
    
	&lt;dependency&gt;
		&lt;groupId&gt;org.logback-extensions&lt;/groupId&gt;
		&lt;artifactId&gt;logback-ext-loggly&lt;/artifactId&gt;			
	&lt;/dependency&gt;
	
	&lt;dependency&gt;
	    &lt;groupId&gt;org.slf4j&lt;/groupId&gt;
	    &lt;artifactId&gt;log4j-over-slf4j&lt;/artifactId&gt;		    
	&lt;/dependency&gt;    

I have referenced these two articles:

Separating application logs in Logback from Spark Logs in log4j<br/>
Configuring Apache Spark Logging with Scala and logback

I've tried using first using (when running spark-submit) :<br/>
--conf "spark.driver.userClassPathFirst=true" <br/>
--conf "spark.executor.userClassPathFirst=true"

but receive the error

    Exception in thread &quot;main&quot; java.lang.LinkageError: loader constraint violation: when resolving method &quot;org.slf4j.impl.StaticLoggerBinder.ge
tLoggerFactory()Lorg/slf4j/ILoggerFactory;&quot; the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current cl
ass, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method&#39;s defining class, org/slf4
j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature      

I would like to get it working with the above, but then i also looked at trying the below<br/>
--conf "spark.driver.extraClassPath=$libs" <br/>
--conf "spark.executor.extraClassPath=$libs"

but since i'm passing my uber jar to spark submit locally AND (on a Amazon EMR cluster) i really can't be specifying a library file location that will be local to my machine. Since the uber jar contains the files, is there a way for it to use those files? Am i forced to copy these libraries to the master/nodes on the EMR cluster when the spark app finally runs from there?

The first approach about using the userClassPathFirst seems like the best route though.

答案1

得分: 1

以下是翻译好的内容:

解决了这个问题,当时遇到了几个问题。

为了让 Spark 允许 logback 正常工作,对我有效的解决方案是结合我在上面发布的文章中的一些内容,以及一个证书文件的问题。

我用于传递给 spark-submit 的证书文件是不完整的,并且覆盖了基本的信任库证书。这导致发送 Https 消息到 Loggly 时出现了问题。

第一部分更改:
将 Maven 更新为阴影 org.slf4j(正如 @matemaciek 的答案中所述)

<dependencies>
    ...
    <dependency>
        <groupId>ch.qos.logback</groupId>
        <artifactId>logback-classic</artifactId>
        <version>1.2.3</version>
    </dependency>
    
    <dependency>
        <groupId>ch.qos.logback</groupId>
        <artifactId>logback-core</artifactId>
        <version>1.2.3</version>
    </dependency>
    
    <dependency>
        <groupId>org.logback-extensions</groupId>
        <artifactId>logback-ext-loggly</artifactId>
        <version>0.1.5</version>
        <scope>runtime</scope>
    </dependency>
    
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>log4j-over-slf4j</artifactId>
        <version>1.7.30</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.1</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <manifestEntries>
                            <Main-Class>com.TestClass</Main-Class>
                        </manifestEntries>
                    </transformer>
                </transformers>
                <relocations>
                    <relocation>
                        <pattern>org.slf4j</pattern>
                        <shadedPattern>com.shaded.slf4j</shadedPattern>
                    </relocation>
                </relocations>
            </configuration>
        </plugin>
    </plugins>
</build>

第一部分附加:logback.xml

<configuration debug="true">
    <appender name="logglyAppender" class="ch.qos.logback.ext.loggly.LogglyAppender">
        <endpointUrl>https://logs-01.loggly.com/bulk/TOKEN/tag/TAGS/</endpointUrl>
        <pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
    </appender>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n</pattern>
        </encoder>
    </appender>
    <root level="info">
        <appender-ref ref="logglyAppender" />
        <appender-ref ref="STDOUT" />
    </root>
</configuration>

第二部分更改:MainClass

import org.slf4j.*;

public class TestClass {

    static final Logger log = LoggerFactory.getLogger(TestClass.class);

    public static void main(String[] args) throws Exception {
        
        log.info("this is a test message");
    }
}

第三部分更改:
我之前是这样提交 Spark 应用的(示例):

spark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar

因此,上述的 spark-submit 在 HTTPS 认证问题上失败了(当时联系 Loggly 发送消息到 Loggly 服务时),因为 rds-truststore.jks 覆盖了没有全部证书的证书。我将其更改为使用 cacerts 存储,现在它有了所需的所有证书。

不再在发送以下消息到 Loggly 时出现错误:

spark-submit --deploy-mode client --class com.TestClass --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" --conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit" com/target/testproject-0.0.1.jar

(以上翻译的是您提供的内容中的代码部分,不包含额外的回答。如需更多帮助,请随时提问。)

英文:

So I solved the issue and had several problems going on.

So in order to get Spark to allow logback to work, the solution that worked for me was from a combination of items from the articles i posted above, and in addition a cert file problem.

The cert file i was using to pass into spark-submit was incomplete and overriding the base truststore certs. This was causing a problem SENDING Https messages to Loggly.

Part 1 change:
Update maven to shade org.slf4j (as stated in an answer by @matemaciek)

      &lt;/dependencies&gt;
         ...
         &lt;dependency&gt;
		    &lt;groupId&gt;ch.qos.logback&lt;/groupId&gt;
		    &lt;artifactId&gt;logback-classic&lt;/artifactId&gt;
		    &lt;version&gt;1.2.3&lt;/version&gt;			    
		&lt;/dependency&gt;
				
		&lt;dependency&gt;
		    &lt;groupId&gt;ch.qos.logback&lt;/groupId&gt;
		    &lt;artifactId&gt;logback-core&lt;/artifactId&gt;
		    &lt;version&gt;1.2.3&lt;/version&gt;
		&lt;/dependency&gt;
        
        &lt;dependency&gt;
			&lt;groupId&gt;org.logback-extensions&lt;/groupId&gt;
			&lt;artifactId&gt;logback-ext-loggly&lt;/artifactId&gt;
			&lt;version&gt;0.1.5&lt;/version&gt;
			&lt;scope&gt;runtime&lt;/scope&gt;
		&lt;/dependency&gt;

        &lt;dependency&gt;
		    &lt;groupId&gt;org.slf4j&lt;/groupId&gt;
		    &lt;artifactId&gt;log4j-over-slf4j&lt;/artifactId&gt;
		    &lt;version&gt;1.7.30&lt;/version&gt;
		&lt;/dependency&gt;
    &lt;/dependencies&gt;

    &lt;build&gt;
        &lt;plugins&gt;
            &lt;plugin&gt;
                &lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
                &lt;artifactId&gt;maven-shade-plugin&lt;/artifactId&gt;
                &lt;version&gt;3.2.1&lt;/version&gt;
                &lt;executions&gt;
                    &lt;execution&gt;
                        &lt;phase&gt;package&lt;/phase&gt;
                        &lt;goals&gt;
                            &lt;goal&gt;shade&lt;/goal&gt;
                        &lt;/goals&gt;
                    &lt;/execution&gt;
                &lt;/executions&gt;
                &lt;configuration&gt;
                    &lt;transformers&gt;
                        &lt;transformer
                                implementation=&quot;org.apache.maven.plugins.shade.resource.ManifestResourceTransformer&quot;&gt;
                            &lt;manifestEntries&gt;
                                &lt;Main-Class&gt;com.TestClass&lt;/Main-Class&gt;
                            &lt;/manifestEntries&gt;
                        &lt;/transformer&gt;
                    &lt;/transformers&gt;
              		&lt;relocations&gt;
						&lt;relocation&gt;
						    &lt;pattern&gt;org.slf4j&lt;/pattern&gt;
						    &lt;shadedPattern&gt;com.shaded.slf4j&lt;/shadedPattern&gt;
					  	&lt;/relocation&gt;
					&lt;/relocations&gt;
                &lt;/configuration&gt;
            &lt;/plugin&gt;
        &lt;/plugins&gt;
    &lt;/build&gt;

Part 1a: the logback.xml

&lt;configuration debug=&quot;true&quot;&gt;
	&lt;appender name=&quot;logglyAppender&quot; class=&quot;ch.qos.logback.ext.loggly.LogglyAppender&quot;&gt;
		&lt;endpointUrl&gt;https://logs-01.loggly.com/bulk/TOKEN/tag/TAGS/&lt;/endpointUrl&gt;
		&lt;pattern&gt;${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n&lt;/pattern&gt;
	&lt;/appender&gt;
	&lt;appender name=&quot;STDOUT&quot; class=&quot;ch.qos.logback.core.ConsoleAppender&quot;&gt;
	    &lt;encoder&gt;
	      &lt;pattern&gt;${hostName} %d{yyyy-MM-dd HH:mm:ss,SSS}{GMT} %p %t %c %M - %m%n&lt;/pattern&gt;
	    &lt;/encoder&gt;
  	&lt;/appender&gt;
	&lt;root level=&quot;info&quot;&gt;
		&lt;appender-ref ref=&quot;logglyAppender&quot; /&gt;
		&lt;appender-ref ref=&quot;STDOUT&quot; /&gt;
	&lt;/root&gt;
&lt;/configuration&gt; 

Part 2 change: The MainClass

import org.slf4j.*;

public class TestClass {

	static final Logger log = LoggerFactory.getLogger(TestClass.class);

	public static void main(String[] args) throws Exception {
		
		log.info(&quot;this is a test message&quot;);
    }
}

Part 3 change:
i was submitting spark application as such (example):

sparkspark-submit --deploy-mode client --class com.TestClass --conf &quot;spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit&quot; --conf &quot;spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/rds-truststore.jks -Djavax.net.ssl.trustStorePassword=changeit&quot; com/target/testproject-0.0.1.jar 

So the above spark-submit failed on a HTTPS certification problem (that was when Loggly was being contacted to send the message to loggly service) because the rds-truststore.jks overwrote the certs without all certs. I changed this to use cacerts store, and it now had all the certs it needed.

No more error at the Loggly part when sending this

sparkspark-submit --deploy-mode client --class com.TestClass --conf &quot;spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit&quot; --conf &quot;spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=c:/src/testproject/cacerts -Djavax.net.ssl.trustStorePassword=changeit&quot; com/target/testproject-0.0.1.jar 

答案2

得分: 0

你必须在Spark选项中使用以下内容:-Dspark.executor.extraJavaOptions=-Dlogback.configurationFile=/spark/logback/logback.xml

在logback.xml文件中,您应该配置logback的设置。

英文:

You have to usein spark opts -Dspark.executor.extraJavaOptions=-Dlogback.configurationFile=/spark/logback/logback.xml

In logback.xml you should have settings for logback.

huangapple
  • 本文由 发表于 2020年10月8日 03:15:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/64250847.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定