2023年6月1日 09:15:17go评论139阅读模式

英文:

Spark reads zero decimal 0.0000000 as 0E-07, how to write this as 0.0000000 (without scientific notation)

问题

这似乎是Spark中的默认行为。

在数据库中，该值为decimal (18,8)，例如：

0.00000000

当Spark读取任何值为零且小数位数超过6位（例如0E-06）时，它会自动将该值转换为科学计数法。

在这种情况下，读取值0.00000000后，该值会自动转换为Dataframe中的0E-08。

我想将我的数据框写入CSV，但是...在写入时，Spark将值0E-08写入CSV，而不是小数0.00000000。

是否有一种方法可以将显式小数值写入CSV，而不使用科学计数法？

注意：

该应用程序是通用的，接受任何表作为输入，并将该表简单地写入CSV文件。
因此，该应用程序不知道数据的模式，也不知道哪些是小数值等等。
每个可能的小数字段可能具有不同的精度和小数位数，因此我无法硬编码这些信息。
使用Spark 2.4.8版本。

英文:

This seems to be default behaviour in Spark.

In db the value is a decimal (18,8) for example:

0.00000000

When Spark reads any decimal value that is zero, and has a scale of more than 6 (eg. 0E-06), then it automatically transforms the value to scientific notation.

In this case the value is auto converted to 0E-08 within the Dataframe after reading the value 0.00000000

I want to write my dataframe to CSV, BUT.. when writing, Spark writes the 0E-08 value to CSV, not the decimal 0.00000000

Is there a way to write the explicit decimal value to CSV, without scientific notation?

Notes:

The app is generic and takes any table as input, and simply writes this table to a CSV file.
Therefore the app does not know the schema of the data, nor which are decimal values etc
Each possible decimal field, may have a different precision and scale, so I cannot hardcode these.
Using Spark 2.4.8

答案1

得分: 1

// 将小数零值的小数位数保持为7或更高的科学计数法自动转换: 0E-10或0E-08等
// 以下函数确保小数值保持原始精度和小数位数

def ensureDecimalsDoNotConvertToSciNotation(df: DataFrame) = {
df.select(df.columns.map { column =>
df.schema(column).dataType.typeName.split("\(")(0) match {
case "decimal" => {
val scale = df.schema(column).dataType.asInstanceOf[DecimalType].scale
if (scale > 6) { // Spark only auto formats 0E-06 and greater
format_number(col(column), scale).alias(column)
} else {
col(column)
}
}
case _ => col(column)
}

}: _*)
}

英文:

// Any decimal zero value with scale of 7 or greater (ie 0.0000000 or higher scale)
// will get auto converted into scientific notation: 0E-10 or 0E-08 etc
// below function ensures that decimal value is kept with original precision and scale

def ensureDecimalsDoNotConvertToSciNotation(df: DataFrame) = {
  df.select(df.columns.map { column =&gt;
    df.schema(column).dataType.typeName.split(&quot;\\(&quot;)(0) match {
      case &quot;decimal&quot; =&gt; {
        val scale = df.schema(column).dataType.asInstanceOf[DecimalType].scale
        if (scale &gt; 6) { // Spark only auto formats 0E-06 and greater
          format_number(col(column), scale).alias(column)
        } else {
          col(column)
        }
      }
      case _ =&gt; col(column)
    }

  }: _*)
}

答案2

得分: -1

请尝试以下代码。我不确定这是否符合您的要求。

# 创建一个名为's'的新列，并将0E-08数据转换为非科学计数法数据
df.withColumn("s", 'value.cast("Decimal(0,8)")')
# 将数据帧保存为csv文件，保存到<Your Output>文件位置
df.write.format('csv').save("<Your Output>")

这是Decimal函数的文档链接：https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.types.DecimalType.html。

英文:

Please try the following code. I am not sure whether this is what you want.

# Create a new column named &#39;s&#39; and convert 0E-08 data to non-scientific notation data
df.withColumn(&quot;s&quot;, &#39;value.cast(&quot;Decimal(0,8)&quot;))
# Save the dataframe as csv and saved to &lt;Your Output&gt; file location 
df.write.format(&#39;csv).save(&quot;&lt;Your Output&gt;&quot;)

This is the docuementation of Decimal function: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.types.DecimalType.html

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Spark reads zero decimal 0.0000000 as 0E-07, how to write this as 0.0000000 (without scientific notation)

问题

答案1

答案2

Spark Scala Dataframe中的`case when`类似函数

How can we read historical data using databricks from kinesis or kafka by specifying starting and ending time stamp?

spark.sqlContext.implicits._ 在 Scala 中是如何工作的？

如何将数据框的列映射到新的列名

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论