2023年6月5日 16:34:15go评论199阅读模式

英文:

Decimal precision exceeds max precision despite decimal having the correct size and precision

问题

我有这段代码

spark = (
    SparkSession.builder
    .appName("pyspark-sandbox")
    .getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS")  # type: ignore

faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506

df = spark.createDataFrame([[value]], schema=['DecimalItem'])
df = df.withColumn('DecimalItem', col('DecimalItem').cast(DecimalType(38, 10)))
df.show()

但在show时出现以下错误：

org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 46 exceeds max precision 38.

这个值8824750032877062776842530687.8719544506似乎适合DecimalType，但仍然失败。问题是什么？

英文:

I have this code

spark = (
    SparkSession.builder
    .appName(&quot;pyspark-sandbox&quot;)
    .getOrCreate()
)
spark.conf.set(&quot;spark.sql.parquet.outputTimestampType&quot;, &quot;TIMESTAMP_MICROS&quot;)  # type: ignore

faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506

df = spark.createDataFrame([[value]], schema=[&#39;DecimalItem&#39;])
df = df.withColumn(&#39;DecimalItem&#39;, col(&#39;DecimalItem&#39;).cast(DecimalType(38, 10)))
df.show()

But on show I get this error

org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 46 exceeds max precision 38.

The value 8824750032877062776842530687.8719544506 seems to fit into DecimalType, yet it fails. What is the problem?

答案1

得分: 1

在进行了一些调查后，我发现如果在创建数据框时传递了StructType模式，它将正常工作，没有问题。

spark = (
    SparkSession.builder
    .appName("pyspark-sandbox")
    .getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS")  # type: ignore

faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506

df = spark.createDataFrame([[value]], schema=StructType([StructField('DecimalItem', DecimalType(38, 10), True)]))
df.show()

虽然不确定为什么会出现cast问题。我猜想如果Pyspark自行确定了DecimalType，就无法将其强制转换为不同的DecimalType？

因为如果我在原始方法中使用df.printSchema来检查模式

DecimalItem: decimal(38,18) (nullable = true)

它默认选择了38,18。

英文:

After some investigation, I found out that if you pass StructType schema, when creating a dataframe - it works properly without issues.

spark = (
    SparkSession.builder
    .appName(&quot;pyspark-sandbox&quot;)
    .getOrCreate()
)
spark.conf.set(&quot;spark.sql.parquet.outputTimestampType&quot;, &quot;TIMESTAMP_MICROS&quot;)  # type: ignore

faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506

df = spark.createDataFrame([[value]], schema=StructType([StructField(&#39;DecimalItem&#39;, DecimalType(38, 10), True)]))
df.show()

Not sure why there is an issue with cast though. I guess you can't cast to a different DecimalType if Pyspark came up with its own?

Because if I check the schema using df.printSchema in my original approach

DecimalItem: decimal(38,18) (nullable = true)

it went with 38,18 by default.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

小数精度超过最大精度，尽管小数具有正确的大小和精度。

问题

答案1

将minmax_scale应用于polars数据框中的所有列。

如何删除白色条纹并合并图例？

你需要改变什么，以便我的龙卷风代码可以成功发布？

Python Async 是什么导致了内存泄漏？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论