2023年4月17日 20:20:43go评论76阅读模式

英文:

I have two dataframe I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql

问题

Sure, here's the translated code part:

我有两个数据帧，如下所示，我想以 `dd hh:mm:ss:SSSS` 的形式获取差异

    slno  old_time                      new_time                    diff_time
    A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
    B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS

我目前正在使用以下查询仅返回日期差异

    from pyspark.sql.functions import datediff
    df = df.select("slno", datediff('new_time', 'old_time').alias("diff_time"))

I've translated the code part and omitted the non-translated content as requested.

英文:

I have two data frame as shown below I would like to get the difference in the form of dd hh:mm:ss:SSSS

ID  date_1                      date_2                      date_diff
A   2019-01-09T01:25:00.000Z    2019-01-10T14:00:00.000Z    -1
B   2019-01-12T02:18:00.000Z    2019-01-12T17:00:00.000Z     0

I am currently using this query that returns only date difference

from pyspark.sql.functions import datediff
df = df.select(&quot;slno&quot;,datediff(&#39;new_time&#39;,&#39;old_time&#39;).alias(diff_time)

I want the final dataframe to be in

slno  old_time                      new_time                    diff_time
A     2019-01-09T01:25:00.000Z      2019-01-10T14:00:00.000Z    -1 HH:MM:ss:SSSS
B     2019-01-12T02:18:00.000Z      2019-01-12T17:00:00.000Z     0 HH:MM:ss:SSSS

how can i achieve this using pyspark or pyspark.sql

答案1

得分: 0

以下是翻译好的部分：

可以从两个时间戳列中相互减去。结果是一个间隔列，可以使用 regexp_extract 从该列中提取预期的输出：

from pyspark.sql import functions as F

df.withColumn('diff', F.col('date_2') - F.col('date_1')) \
  .withColumn('diff', F.regexp_extract('diff', "([0-9\s:.]{10,})",0)) \
  .show(truncate=False)

结果（测试数据稍作修改）：

+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+

英文:

You can substract the two timestamp columns from each other. The result is an interval column and the expected output can be taken from this column using regexp_extract:

from pyspark.sql import functions as F

df.withColumn(&#39;diff&#39;, F.col(&#39;date_2&#39;) - F.col(&#39;date_1&#39;)) \
  .withColumn(&#39;diff&#39;, F.regexp_extract(&#39;diff&#39;, &quot;([0-9\s:.]{10,})&quot;,0)) \
  .show(truncate=False)

Result (test data slightly changed):

+---+----------------------+-------------------+-------------+
|ID |date_1                |date_2             |diff         |
+---+----------------------+-------------------+-------------+
|A  |2019-01-09 02:25:01.02|2019-01-10 15:00:00|1 12:34:58.98|
|B  |2019-01-12 03:18:00   |2019-01-12 18:00:00|0 14:42:00   |
|C  |2019-01-12 03:18:00   |2020-01-12 18:00:00|365 14:42:00 |
+---+----------------------+-------------------+-------------+

答案2

得分: 0

为了使用Spark SQL计算两个日期之间的小时、分钟、秒和毫秒差异，您可以使用TIMESTAMPDIFF()、DATEDIFF()和CONCAT()函数。以下是查询：

SELECT sl_no,
   date_1 AS old_time,
   date_2 AS new_time,
   CONCAT(
           DATEDIFF(date_2, date_1),
           ' ',
           HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           ':',
           SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
           '.',
           SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
       )  AS diff_time
FROM df1;

英文:

To calculate the difference between two dates in hours, minutes, seconds, and milliseconds using Spark SQL, you can use the TIMESTAMPDIFF(), DATEDIFF(), and CONCAT() functions.
Here is the query:

    SELECT sl_no,
       date_1 AS old_time,
       date_2 AS new_time,
       CONCAT(
               DATEDIFF(date_2, date_1),
               &#39; &#39;,
               HOUR(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               &#39;:&#39;,
               MINUTE(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               &#39;:&#39;,
               SECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)),
               &#39;.&#39;,
               SUBSTRING(MICROSECOND(TIMESTAMPDIFF(SECOND, date_1, date_2)), 1, 3)
           )  AS diff_time
FROM df1;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql.

问题

答案1

答案2

将来自Google Cloud Storage的Parquet文件的分区列添加到BigQuery。

command-runner.jar和script-runner.jar在AWS EMR中的作用是什么？

将panda数据框保存为CSV会更改数值。

如何将数据框从Jinja传递到Flask路由

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论