如何将ISO 8601持续时间转换为秒

huangapple go评论67阅读模式
英文:

How to convert ISO 8601 duration to seconds

问题

我有一个Spark DataFrame中的列,名为time_span,其值以ISO 8601持续时间格式表示,例如:P0Y0M0DT0H5M35S。我想将这些值转换为秒。在Spark或Scala中是否有帮助我做到这一点的函数?我正在寻找一种方法,但未成功。
我尝试使用持续时间:

import java.time.Duration
java.time.Duration.parse("P0Y0M0DT0H5M35S")

这给了我一个错误:

java.time.format.DateTimeParseException: Text cannot be parsed to a Duration

我在将值传递给函数时是否有什么问题?我找到了这个文档:https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html

如果我成功以这种方式做到了,那么将不得不应用额外的逻辑来在整个DataFrame列上执行它。

英文:

I have a column in spark dataframe as
time_span
values are in iso 8601 duration
ex: P0Y0M0DT0H5M35S . I want to convert that values in to seconds. Is there a function in spark or Scala which will help me do that? I am looking for a way and was unsuccessful
I tried with duration

import java.time.Duration
java.time.Duration.parse("P0Y0M0DT0H5M35S")

This gives me err as:

java.time.format.DateTimeParseException: Text cannot be parsed to a Duration

Am I doing anything wrong in passing value to function. I found this documentation
https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html

If I was successful in doing it this way then will have to apply additional logic to do it on whole dataframe column

答案1

得分: 1

希望以下的方法能对您有所帮助。

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val isoToSecondsUDF = udf( (value: String) => (java.time.Duration.parse("PT".concat(value.split("T")(1))).get(java.time.temporal.ChronoUnit.SECONDS)))

val df=Seq(("P0Y0M0DT0H5M35S")).toDF("value")

df.withColumn("seconds",isoToSecondsUDF($"value")).show()
/*
+---------------+-------+
|          value|seconds|
+---------------+-------+
|P0Y0M0DT0H5M35S|    335|
+---------------+-------+
*/
英文:

hope the below approach helps you.

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val isoToSecondsUDF = udf( (value: String) => (java.time.Duration.parse("PT".concat(value.split("T")(1))).get(java.time.temporal.ChronoUnit.SECONDS)))

val df=Seq(("P0Y0M0DT0H5M35S")).toDF("value")

df.withColumn("seconds",isoToSecondsUDF($"value")).show()
/*
+---------------+-------+
|          value|seconds|
+---------------+-------+
|P0Y0M0DT0H5M35S|    335|
+---------------+-------+
*/

答案2

得分: 1

以下是已更新的解决方案,以涵盖包含月份和日期的情况,例如:P0Y0M2DT23H59M56S 和 P0Y1M2DT23H59M56S

我们需要使用 time4j 库:https://github.com/MenoData/Time4J

以下是代码:

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import net.time4j.Duration

def getSeconds(value: String): String = {
  val b = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.MONTHS)
  val c = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.DAYS)
  val days = ((b * 30) + c).toString()
  val seconds = (java.time.Duration.parse("P".concat(days).concat("DT").concat(if (value.contains("T")) value.split("T")(1) else value.split("D")(1))).get(java.time.temporal.ChronoUnit.SECONDS)).toString()
  seconds
}

val isoToSecondsUDF = udf((value: String) => getSeconds(value))
spark.udf.register("isoToSecondsUDF", isoToSecondsUDF)
val df = Seq(("P0Y0M2DT23H59M56S")).toDF("value")
df.withColumn("seconds", isoToSecondsUDF($"value")).show()

首先获取月份数量,然后将其转换为天数并添加到现有的天数中,然后将其传递给解析方法。
@sathya

输出:

+-----------------+-------+
|            value|seconds|
+-----------------+-------+
|P0Y0M2DT23H59M56S| 259196|
+-----------------+-------+

+-----------------+-------+
|            value|seconds|
+-----------------+-------+
|P0Y1M2DT23H59M56S|2851196|
+-----------------+-------+
英文:

Updated Solution to cover case where month and day is present
for eg: P0Y0M2DT23H59M56S. and P0Y1M2DT23H59M56S

We will need to use time4j lib : https://github.com/MenoData/Time4J

Here is code :

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import  net.time4j.Duration


def getSeconds(value: String) : String={
var b = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.MONTHS)
var c = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.DAYS)
var days =((b*30)+c).toString()
var seconds = (java.time.Duration.parse("P".concat(days).concat("DT").concat(if(value.contains("T")) value.split("T")(1) else value.split("D")(1))).get(java.time.temporal.ChronoUnit.SECONDS)).toString()
return seconds
}
val isoToSecondsUDF = udf( (value: String) => getSeconds(value))
spark.udf.register("isoToSecondsUDF", isoToSecondsUDF)
val df=Seq(("P0Y0M2DT23H59M56S")).toDF("value")
df.withColumn("seconds",isoToSecondsUDF($"value")).show()

First get the number of months then convert to days and add it to existing number of days then pass that to parse method.
@sathya

Output:

+-----------------+-------+
|            value|seconds|
+-----------------+-------+
|P0Y0M2DT23H59M56S| 259196|
+-----------------+-------+

+-----------------+-------+
|            value|seconds|
+-----------------+-------+
|P0Y1M2DT23H59M56S|2851196|
+-----------------+-------+

huangapple
  • 本文由 发表于 2020年8月11日 23:56:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/63361924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定