英文:
I have two dataframe I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql
问题
Sure, here's the translated code part:
我有两个数据帧,如下所示,我想以 `dd hh:mm:ss:SSSS` 的形式获取差异
slno old_time new_time diff_time
A 2019-01-09T01:25:00.000Z 2019-01-10T14:00:00.000Z -1 HH:MM:ss:SSSS
B 2019-01-12T02:18:00.000Z 2019-01-12T17:00:00.000Z 0 HH:MM:ss:SSSS
我目前正在使用以下查询仅返回日期差异
from pyspark.sql.functions import datediff
df = df.select("slno", datediff('new_time', 'old_time').alias("diff_time"))
I've translated the code part and omitted the non-translated content as requested.
英文:
I have two data frame as shown below I would like to get the difference in the form of dd hh:mm:ss:SSSS
ID date_1 date_2 date_diff
A 2019-01-09T01:25:00.000Z 2019-01-10T14:00:00.000Z -1
B 2019-01-12T02:18:00.000Z 2019-01-12T17:00:00.000Z 0
I am currently using this query that returns only date difference
from pyspark.sql.functions import datediff
df = df.select("slno",datediff('new_time','old_time').alias(diff_time)
I want the final dataframe to be in
slno old_time new_time diff_time
A 2019-01-09T01:25:00.000Z 2019-01-10T14:00:00.000Z -1 HH:MM:ss:SSSS
B 2019-01-12T02:18:00.000Z 2019-01-12T17:00:00.000Z 0 HH:MM:ss:SSSS
how can i achieve this using pyspark or pyspark.sql
答案1
得分: 0
以下是翻译好的部分:
可以从两个时间戳列中相互减去。结果是一个间隔列,可以使用 regexp_extract
从该列中提取预期的输出:
from pyspark.sql import functions as F
df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
.withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
.show(truncate=False)
结果(测试数据稍作修改):
+---+----------------------+-------------------+-------------+
|ID |date_1 |date_2 |diff |
+---+----------------------+-------------------+-------------+
|A |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B |2019-01-12 03:18:00 |2019-01-12 18:00:00|0 14:42:00 |
|C |2019-01-12 03:18:00 |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+
英文:
You can substract the two timestamp columns from each other. The result is an interval column and the expected output can be taken from this column using regexp_extract
:
from pyspark.sql import functions as F
df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
.withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
.show(truncate=False)
Result (test data slightly changed):
+---+----------------------+-------------------+-------------+
|ID |date_1 |date_2 |diff |
+---+----------------------+-------------------+-------------+
|A |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B |2019-01-12 03:18:00 |2019-01-12 18:00:00|0 14:42:00 |
|C |2019-01-12 03:18:00 |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+
答案2
得分: 0
为了使用Spark SQL计算两个日期之间的小时、分钟、秒和毫秒差异,您可以使用TIMESTAMPDIFF()、DATEDIFF()和CONCAT()函数。以下是查询:
SELECT sl_no,
date_1 AS old_time,
date_2 AS new_time,
CONCAT(
DATEDIFF(date_2, date_1),
' ',
HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
':',
MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
':',
SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
'.',
SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
) AS diff_time
FROM df1;
英文:
To calculate the difference between two dates in hours, minutes, seconds, and milliseconds using Spark SQL, you can use the TIMESTAMPDIFF(), DATEDIFF(), and CONCAT() functions.
Here is the query:
SELECT sl_no,
date_1 AS old_time,
date_2 AS new_time,
CONCAT(
DATEDIFF(date_2, date_1),
' ',
HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
':',
MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
':',
SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
'.',
SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
) AS diff_time
FROM df1;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论