I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql.

huangapple go评论76阅读模式
英文:

I have two dataframe I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql

问题

Sure, here's the translated code part:

我有两个数据帧如下所示我想以 `dd hh:mm:ss:SSSS` 的形式获取差异

    slno  old_time                      new_time                    diff_time
    A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
    B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS

我目前正在使用以下查询仅返回日期差异

    from pyspark.sql.functions import datediff
    df = df.select("slno", datediff('new_time', 'old_time').alias("diff_time"))

I've translated the code part and omitted the non-translated content as requested.

英文:

I have two data frame as shown below I would like to get the difference in the form of dd hh:mm:ss:SSSS

ID  date_1                      date_2                      date_diff
A   2019-01-09T01:25:00.000Z    2019-01-10T14:00:00.000Z    -1
B   2019-01-12T02:18:00.000Z    2019-01-12T17:00:00.000Z     0

I am currently using this query that returns only date difference

from pyspark.sql.functions import datediff
df = df.select("slno",datediff('new_time','old_time').alias(diff_time) 

I want the final dataframe to be in

slno  old_time                      new_time                    diff_time
A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS

how can i achieve this using pyspark or pyspark.sql

答案1

得分: 0

以下是翻译好的部分:

可以从两个时间戳列中相互减去。结果是一个间隔列,可以使用 regexp_extract 从该列中提取预期的输出:

from pyspark.sql import functions as F

df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
  .withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
  .show(truncate=False)

结果(测试数据稍作修改):

+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+
英文:

You can substract the two timestamp columns from each other. The result is an interval column and the expected output can be taken from this column using regexp_extract:

from pyspark.sql import functions as F

df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
  .withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
  .show(truncate=False)

Result (test data slightly changed):

+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+

答案2

得分: 0

为了使用Spark SQL计算两个日期之间的小时、分钟、秒和毫秒差异,您可以使用TIMESTAMPDIFF()、DATEDIFF()和CONCAT()函数。以下是查询:

SELECT sl_no,
   date_1 AS old_time,
   date_2 AS new_time,
   CONCAT(
           DATEDIFF(date_2, date_1),
           ' ',
           HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           '.',
           SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
       )  AS diff_time
FROM df1;
英文:

To calculate the difference between two dates in hours, minutes, seconds, and milliseconds using Spark SQL, you can use the TIMESTAMPDIFF(), DATEDIFF(), and CONCAT() functions.
Here is the query:

    SELECT sl_no,
       date_1 AS old_time,
       date_2 AS new_time,
       CONCAT(
               DATEDIFF(date_2, date_1),
               ' ',
               HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               ':',
               MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               ':',
               SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               '.',
               SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
           )  AS diff_time
FROM df1;

huangapple
  • 本文由 发表于 2023年4月17日 20:20:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035109.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定