英文:
Decimal precision exceeds max precision despite decimal having the correct size and precision
问题
我有这段代码
spark = (
SparkSession.builder
.appName("pyspark-sandbox")
.getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") # type: ignore
faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506
df = spark.createDataFrame([[value]], schema=['DecimalItem'])
df = df.withColumn('DecimalItem', col('DecimalItem').cast(DecimalType(38, 10)))
df.show()
但在show
时出现以下错误:
org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 46 exceeds max precision 38.
这个值8824750032877062776842530687.8719544506
似乎适合DecimalType
,但仍然失败。问题是什么?
英文:
I have this code
spark = (
SparkSession.builder
.appName("pyspark-sandbox")
.getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") # type: ignore
faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506
df = spark.createDataFrame([[value]], schema=['DecimalItem'])
df = df.withColumn('DecimalItem', col('DecimalItem').cast(DecimalType(38, 10)))
df.show()
But on show
I get this error
org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 46 exceeds max precision 38.
The value 8824750032877062776842530687.8719544506
seems to fit into DecimalType
, yet it fails. What is the problem?
答案1
得分: 1
在进行了一些调查后,我发现如果在创建数据框时传递了StructType
模式,它将正常工作,没有问题。
spark = (
SparkSession.builder
.appName("pyspark-sandbox")
.getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") # type: ignore
faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506
df = spark.createDataFrame([[value]], schema=StructType([StructField('DecimalItem', DecimalType(38, 10), True)]))
df.show()
虽然不确定为什么会出现cast
问题。我猜想如果Pyspark自行确定了DecimalType,就无法将其强制转换为不同的DecimalType?
因为如果我在原始方法中使用df.printSchema
来检查模式
DecimalItem: decimal(38,18) (nullable = true)
它默认选择了38,18
。
英文:
After some investigation, I found out that if you pass StructType
schema, when creating a dataframe - it works properly without issues.
spark = (
SparkSession.builder
.appName("pyspark-sandbox")
.getOrCreate()
)
spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") # type: ignore
faker = Faker()
value = faker.pydecimal(left_digits=28, right_digits=10)
print(value) # 8824750032877062776842530687.8719544506
df = spark.createDataFrame([[value]], schema=StructType([StructField('DecimalItem', DecimalType(38, 10), True)]))
df.show()
Not sure why there is an issue with cast
though. I guess you can't cast to a different DecimalType if Pyspark came up with its own?
Because if I check the schema using df.printSchema
in my original approach
DecimalItem: decimal(38,18) (nullable = true)
it went with 38,18
by default.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论