2023年6月29日 01:51:20go评论148阅读模式

英文:

Why is .simpleString() method of spark schema truncating my output?

问题

我有一个非常长的模式，我想将其作为字符串返回。

import org.apache.spark.SparkConf;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

...

SparkSession spark = SparkSession.builder().config(new SparkConf().setAppName("YourApp").setMaster("local")).getOrCreate();

Dataset<Row> parquetData = spark.read().parquet("/Users/demo/test.parquet");

String schemaString = parquetData.schema().simpleString();

问题在于生成的模式看起来像是（参见“还有10个字段”）：

struct<test:struct<countryConfidence:struct<value:double>,... 10 more fields> etc etc>

使用：

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql_2.12</artifactId>
  <version>3.2.4</version>
</dependency>

是否有一些配置选项可以让 .simpleString 不会截断？我尝试过 parquetData.schema().toDDL()，但它没有打印出我需要的格式。

英文:

I have a very long schema that I want to return as string

import org.apache.spark.SparkConf;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

...

SparkSession spark = SparkSession.builder().config(new SparkConf().setAppName(&quot;YourApp&quot;).setMaster(&quot;local&quot;)).getOrCreate();

Dataset&lt;Row&gt; parquetData = spark.read().parquet(&quot;/Users/demo/test.parquet&quot;);

String schemaString = parquetData.schema().simpleString();

The problem is the resulting schema looks like (see "10 more fields"):

struct&lt;test:struct&lt;countryConfidence:struct&lt;value:double&gt;,... 10 more fields&gt; etc etc&gt;

Using

&lt;dependency&gt;
  &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  &lt;artifactId&gt;spark-sql_2.12&lt;/artifactId&gt;
  &lt;version&gt;3.2.4&lt;/version&gt;
&lt;/dependency&gt;

Is there some configuration option I can use that means .simpleString does not truncate? I've tried parquetData.schema().toDDL(), but it doesn't print the format I need.

答案1

得分: 1

如果你深入查看simpleString方法，你会发现Spark使用了一个truncatedString，其中第三个参数传递了SQLConf.get.maxToStringFields。

这个配置的定义如下：

val MAX_TO_STRING_FIELDS = buildConf("spark.sql.debug.maxToStringFields")
  .doc("在调试输出中可以转换为字符串的序列样式条目字段的最大数量。超出限制的任何元素都将被丢弃并替换为“... N个字段”占位符。")
  .version("3.0.0")
  .intConf
  .createWithDefault(25)

解决方案：

将spark.sql.debug.maxToStringFields调整为高于25的数字，比如50（任意值，但应根据您的用例确定），例如：

SparkSession spark = SparkSession.builder()
  .appName("Spark应用名称")
  .master("local[*]")
  .config("spark.sql.debug.maxToStringFields", 50)
  .getOrCreate();

祝好运！

英文:

If you take a deeper look inside simpleString method, you can see that Spark uses a truncatedString, where SQLConf.get.maxToStringFields is passed as third argument.

The definition of this configuration is described as:

val MAX_TO_STRING_FIELDS = buildConf(&quot;spark.sql.debug.maxToStringFields&quot;)
  .doc(&quot;Maximum number of fields of sequence-like entries can be converted to strings &quot; +
    &quot;in debug output. Any elements beyond the limit will be dropped and replaced by a&quot; +
    &quot;&quot;&quot; &quot;... N more fields&quot; placeholder.&quot;&quot;&quot;)
  .version(&quot;3.0.0&quot;)
  .intConf
  .createWithDefault(25)

Solution

Tweaking spark.sql.debug.maxToStringFields to a higher number than 25, like 50 (arbitrary, but should be determined by your use case), for example:

SparkSession spark = SparkSession.builder()
  .appName(&quot;Spark app name&quot;)
  .master(&quot;local[*]&quot;)
  .config(&quot;spark.sql.debug.maxToStringFields&quot;, 50)
  .getOrCreate();

Good luck!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么 Spark 模式的 .simpleString() 方法会截断我的输出？

问题

答案1

如何在Java中检测字符串是否包含表情符号？

编译器无法读取导入的类。

When calling createNewFile() in android/java, why do I get: java.io.IOException: No such file or directory

一个按钮在Kotlin中实现媒体录制的播放和暂停功能。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论