英文:
I have two dataframe I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql
问题
Sure, here's the translated code part:
我有两个数据帧,如下所示,我想以 `dd hh:mm:ss:SSSS` 的形式获取差异
    slno  old_time                      new_time                    diff_time
    A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
    B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS
我目前正在使用以下查询仅返回日期差异
    from pyspark.sql.functions import datediff
    df = df.select("slno", datediff('new_time', 'old_time').alias("diff_time"))
I've translated the code part and omitted the non-translated content as requested.
英文:
I have two data frame as shown below I would like to get the difference in the form of dd hh:mm:ss:SSSS
ID  date_1                      date_2                      date_diff
A   2019-01-09T01:25:00.000Z    2019-01-10T14:00:00.000Z    -1
B   2019-01-12T02:18:00.000Z    2019-01-12T17:00:00.000Z     0
I am currently using this query that returns only date difference
from pyspark.sql.functions import datediff
df = df.select("slno",datediff('new_time','old_time').alias(diff_time) 
I want the final dataframe to be in
slno  old_time                      new_time                    diff_time
A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS
how can i achieve this using pyspark or pyspark.sql
答案1
得分: 0
以下是翻译好的部分:
可以从两个时间戳列中相互减去。结果是一个间隔列,可以使用 regexp_extract 从该列中提取预期的输出:
from pyspark.sql import functions as F
df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
  .withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
  .show(truncate=False)
结果(测试数据稍作修改):
+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+
英文:
You can substract the two timestamp columns from each other. The result is an interval column and the expected output can be taken from this column using regexp_extract:
from pyspark.sql import functions as F
df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
  .withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
  .show(truncate=False)
Result (test data slightly changed):
+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+
答案2
得分: 0
为了使用Spark SQL计算两个日期之间的小时、分钟、秒和毫秒差异,您可以使用TIMESTAMPDIFF()、DATEDIFF()和CONCAT()函数。以下是查询:
SELECT sl_no,
   date_1 AS old_time,
   date_2 AS new_time,
   CONCAT(
           DATEDIFF(date_2, date_1),
           ' ',
           HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           '.',
           SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
       )  AS diff_time
FROM df1;
英文:
To calculate the difference between two dates in hours, minutes, seconds, and milliseconds using Spark SQL, you can use the TIMESTAMPDIFF(), DATEDIFF(), and CONCAT() functions.
Here is the query:
    SELECT sl_no,
       date_1 AS old_time,
       date_2 AS new_time,
       CONCAT(
               DATEDIFF(date_2, date_1),
               ' ',
               HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               ':',
               MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               ':',
               SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               '.',
               SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
           )  AS diff_time
FROM df1;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论