英文:
How to convert ISO 8601 duration to seconds
问题
我有一个Spark DataFrame中的列,名为time_span,其值以ISO 8601持续时间格式表示,例如:P0Y0M0DT0H5M35S
。我想将这些值转换为秒。在Spark或Scala中是否有帮助我做到这一点的函数?我正在寻找一种方法,但未成功。
我尝试使用持续时间:
import java.time.Duration
java.time.Duration.parse("P0Y0M0DT0H5M35S")
这给了我一个错误:
java.time.format.DateTimeParseException: Text cannot be parsed to a Duration
我在将值传递给函数时是否有什么问题?我找到了这个文档:https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html
如果我成功以这种方式做到了,那么将不得不应用额外的逻辑来在整个DataFrame列上执行它。
英文:
I have a column in spark dataframe as
time_span
values are in iso 8601 duration
ex: P0Y0M0DT0H5M35S
. I want to convert that values in to seconds. Is there a function in spark or Scala which will help me do that? I am looking for a way and was unsuccessful
I tried with duration
import java.time.Duration
java.time.Duration.parse("P0Y0M0DT0H5M35S")
This gives me err as:
java.time.format.DateTimeParseException: Text cannot be parsed to a Duration
Am I doing anything wrong in passing value to function. I found this documentation
https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html
If I was successful in doing it this way then will have to apply additional logic to do it on whole dataframe column
答案1
得分: 1
希望以下的方法能对您有所帮助。
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val isoToSecondsUDF = udf( (value: String) => (java.time.Duration.parse("PT".concat(value.split("T")(1))).get(java.time.temporal.ChronoUnit.SECONDS)))
val df=Seq(("P0Y0M0DT0H5M35S")).toDF("value")
df.withColumn("seconds",isoToSecondsUDF($"value")).show()
/*
+---------------+-------+
| value|seconds|
+---------------+-------+
|P0Y0M0DT0H5M35S| 335|
+---------------+-------+
*/
英文:
hope the below approach helps you.
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val isoToSecondsUDF = udf( (value: String) => (java.time.Duration.parse("PT".concat(value.split("T")(1))).get(java.time.temporal.ChronoUnit.SECONDS)))
val df=Seq(("P0Y0M0DT0H5M35S")).toDF("value")
df.withColumn("seconds",isoToSecondsUDF($"value")).show()
/*
+---------------+-------+
| value|seconds|
+---------------+-------+
|P0Y0M0DT0H5M35S| 335|
+---------------+-------+
*/
答案2
得分: 1
以下是已更新的解决方案,以涵盖包含月份和日期的情况,例如:P0Y0M2DT23H59M56S 和 P0Y1M2DT23H59M56S
我们需要使用 time4j 库:https://github.com/MenoData/Time4J
以下是代码:
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import net.time4j.Duration
def getSeconds(value: String): String = {
val b = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.MONTHS)
val c = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.DAYS)
val days = ((b * 30) + c).toString()
val seconds = (java.time.Duration.parse("P".concat(days).concat("DT").concat(if (value.contains("T")) value.split("T")(1) else value.split("D")(1))).get(java.time.temporal.ChronoUnit.SECONDS)).toString()
seconds
}
val isoToSecondsUDF = udf((value: String) => getSeconds(value))
spark.udf.register("isoToSecondsUDF", isoToSecondsUDF)
val df = Seq(("P0Y0M2DT23H59M56S")).toDF("value")
df.withColumn("seconds", isoToSecondsUDF($"value")).show()
首先获取月份数量,然后将其转换为天数并添加到现有的天数中,然后将其传递给解析方法。
@sathya
输出:
+-----------------+-------+
| value|seconds|
+-----------------+-------+
|P0Y0M2DT23H59M56S| 259196|
+-----------------+-------+
+-----------------+-------+
| value|seconds|
+-----------------+-------+
|P0Y1M2DT23H59M56S|2851196|
+-----------------+-------+
英文:
Updated Solution to cover case where month and day is present
for eg: P0Y0M2DT23H59M56S. and P0Y1M2DT23H59M56S
We will need to use time4j lib : https://github.com/MenoData/Time4J
Here is code :
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import net.time4j.Duration
def getSeconds(value: String) : String={
var b = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.MONTHS)
var c = Duration.parsePeriod(value).toTemporalAmount().get(java.time.temporal.ChronoUnit.DAYS)
var days =((b*30)+c).toString()
var seconds = (java.time.Duration.parse("P".concat(days).concat("DT").concat(if(value.contains("T")) value.split("T")(1) else value.split("D")(1))).get(java.time.temporal.ChronoUnit.SECONDS)).toString()
return seconds
}
val isoToSecondsUDF = udf( (value: String) => getSeconds(value))
spark.udf.register("isoToSecondsUDF", isoToSecondsUDF)
val df=Seq(("P0Y0M2DT23H59M56S")).toDF("value")
df.withColumn("seconds",isoToSecondsUDF($"value")).show()
First get the number of months then convert to days and add it to existing number of days then pass that to parse method.
@sathya
Output:
+-----------------+-------+
| value|seconds|
+-----------------+-------+
|P0Y0M2DT23H59M56S| 259196|
+-----------------+-------+
+-----------------+-------+
| value|seconds|
+-----------------+-------+
|P0Y1M2DT23H59M56S|2851196|
+-----------------+-------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论