Spark reads zero decimal 0.0000000 as 0E-07, how to write this as 0.0000000 (without scientific notation)

huangapple go评论61阅读模式
英文:

Spark reads zero decimal 0.0000000 as 0E-07, how to write this as 0.0000000 (without scientific notation)

问题

这似乎是Spark中的默认行为。

在数据库中,该值为decimal (18,8),例如:

0.00000000

当Spark读取任何值为零且小数位数超过6位(例如0E-06)时,它会自动将该值转换为科学计数法。

在这种情况下,读取值0.00000000后,该值会自动转换为Dataframe中的0E-08。

我想将我的数据框写入CSV,但是...在写入时,Spark将值0E-08写入CSV,而不是小数0.00000000。

是否有一种方法可以将显式小数值写入CSV,而不使用科学计数法?

注意:

  • 该应用程序是通用的,接受任何表作为输入,并将该表简单地写入CSV文件。
  • 因此,该应用程序不知道数据的模式,也不知道哪些是小数值等等。
  • 每个可能的小数字段可能具有不同的精度和小数位数,因此我无法硬编码这些信息。
  • 使用Spark 2.4.8版本。
英文:

This seems to be default behaviour in Spark.

In db the value is a decimal (18,8) for example:

0.00000000

When Spark reads any decimal value that is zero, and has a scale of more than 6 (eg. 0E-06), then it automatically transforms the value to scientific notation.

In this case the value is auto converted to 0E-08 within the Dataframe after reading the value 0.00000000

.

I want to write my dataframe to CSV, BUT.. when writing, Spark writes the 0E-08 value to CSV, not the decimal 0.00000000

Is there a way to write the explicit decimal value to CSV, without scientific notation?

Notes:

  • The app is generic and takes any table as input, and simply writes this table to a CSV file.
  • Therefore the app does not know the schema of the data, nor which are decimal values etc
  • Each possible decimal field, may have a different precision and scale, so I cannot hardcode these.
  • Using Spark 2.4.8

答案1

得分: 1

// 将小数零值的小数位数保持为7或更高的科学计数法自动转换: 0E-10或0E-08等
// 以下函数确保小数值保持原始精度和小数位数

def ensureDecimalsDoNotConvertToSciNotation(df: DataFrame) = {
df.select(df.columns.map { column =>
df.schema(column).dataType.typeName.split("\(")(0) match {
case "decimal" => {
val scale = df.schema(column).dataType.asInstanceOf[DecimalType].scale
if (scale > 6) { // Spark only auto formats 0E-06 and greater
format_number(col(column), scale).alias(column)
} else {
col(column)
}
}
case _ => col(column)
}

}: _*)
}

英文:
// Any decimal zero value with scale of 7 or greater (ie 0.0000000 or higher scale)
// will get auto converted into scientific notation: 0E-10 or 0E-08 etc
// below function ensures that decimal value is kept with original precision and scale

def ensureDecimalsDoNotConvertToSciNotation(df: DataFrame) = {
  df.select(df.columns.map { column =>
    df.schema(column).dataType.typeName.split("\\(")(0) match {
      case "decimal" => {
        val scale = df.schema(column).dataType.asInstanceOf[DecimalType].scale
        if (scale > 6) { // Spark only auto formats 0E-06 and greater
          format_number(col(column), scale).alias(column)
        } else {
          col(column)
        }
      }
      case _ => col(column)
    }

  }: _*)
}

答案2

得分: -1

请尝试以下代码。我不确定这是否符合您的要求。

# 创建一个名为's'的新列,并将0E-08数据转换为非科学计数法数据
df.withColumn("s", 'value.cast("Decimal(0,8)")')
# 将数据帧保存为csv文件,保存到<Your Output>文件位置
df.write.format('csv').save("<Your Output>")

这是Decimal函数的文档链接:https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.types.DecimalType.html

英文:

Please try the following code. I am not sure whether this is what you want.

# Create a new column named &#39;s&#39; and convert 0E-08 data to non-scientific notation data
df.withColumn(&quot;s&quot;, &#39;value.cast(&quot;Decimal(0,8)&quot;))
# Save the dataframe as csv and saved to &lt;Your Output&gt; file location 
df.write.format(&#39;csv).save(&quot;&lt;Your Output&gt;&quot;)

This is the docuementation of Decimal function: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.types.DecimalType.html

huangapple
  • 本文由 发表于 2023年6月1日 09:15:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378109.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定