2023年5月30日 03:03:50go评论62阅读模式

英文:

How to create new columns based on values in two specific columns of a Spark DataFrame?

问题

我有一个数据框：

client  type
------  ----
89      id
56      id
34      id
13      id
67      phone
68      phone

我需要基于列 "client" 和 "type" 创建两个新列。当 "type" == "id" 时，将客户号添加到列 "id" 中，当 "type" == "phone" 时，将客户号添加到列 "phone" 中。

我尝试了：

Df.withColumn("id", when($"type" === "id", $"client")).withColumn("phone", when($"type" === "phone", $"client"))

我得到了以下结果：

+--------+----+--------+
|  client|type|id|phone|
+--------+----+--------+
|      89|cuid|89| null|
|      56|cuid|56| null|
|      34|cuid|34| null|
|      13|cuid|13| null|
+--------+-------------+

但是期望的结果是：

+--------+----+----------+
|  client|type|  id|phone|
+--------+----+----------+
|      89|cuid|  89| null|
|      56|cuid|  56| null|
|      34|cuid|  34| null|
|      13|cuid|  13| null|
|      67|cuid|null|   67|
|      68|cuid|null|   68|
+--------+---------------+

英文:

I have dataframe :

client  type
------  ----
89      id
56      id
34      id
13      id
67      phone
68      phone

I need create two new column based on column "client" and "type". Where "type" == "id", then client number to column "id", where "type" == "phone", then client number to column "phone"

I tried:

Df.withColumn(&quot;id&quot;, when($&quot;type&quot; === &quot;id&quot;, $&quot;client&quot;)).withColumn(&quot;phone&quot;, when($&quot;type&quot; === &quot;phone&quot;, $&quot;client&quot;))

and I get this result :

+--------+----+--------+
|  client|type|id|phone|
+--------+----+--------+
|      89|cuid|89| null|
|      56|cuid|56| null|
|      34|cuid|34| null|
|      13|cuid|13| null|
+--------+-------------+

but expected result is :

+--------+----+----------+
|  client|type|  id|phone|
+--------+----+----------+
|      89|cuid|  89| null|
|      56|cuid|  56| null|
|      34|cuid|  34| null|
|      13|cuid|  13| null|
|      67|cuid|null|   67|
|      68|cuid|null|   68|
+--------+---------------+

答案1

得分: -1

import pyspark.sql.functions as F

x = [(89, "id"), (56, "id"), (34, "id"), (13, "id"), (67, "phone"), (68, "phone")]

df = (
    spark.createDataFrame(x, schema=["client", "type"])
    .withColumn("id", F.when(F.col("type") == F.lit("id"), F.col("client")))
    .withColumn("phone", F.when(F.col("type") == F.lit("phone"), F.col("client")))
    .show()
)

英文:

You can try something like this:

import pyspark.sql.functions as F

x = [(89, &quot;id&quot;), (56, &quot;id&quot;), (34, &quot;id&quot;), (13, &quot;id&quot;), (67, &quot;phone&quot;), (68, &quot;phone&quot;)]

df = (
    spark.createDataFrame(x, schema=[&quot;client&quot;, &quot;type&quot;])
    .withColumn(&quot;id&quot;, F.when(F.col(&quot;type&quot;) == F.lit(&quot;id&quot;), F.col(&quot;client&quot;)))
    .withColumn(&quot;phone&quot;, F.when(F.col(&quot;type&quot;) == F.lit(&quot;phone&quot;), F.col(&quot;client&quot;)))
    .show()
)

output:

+------+-----+----+-----+
|client| type|  id|phone|
+------+-----+----+-----+
|    89|   id|  89| null|
|    56|   id|  56| null|
|    34|   id|  34| null|
|    13|   id|  13| null|
|    67|phone|null|   67|
|    68|phone|null|   68|
+------+-----+----+-----+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于Spark DataFrame中两个特定列的值如何创建新列？

问题

答案1

如何将一列拆分为列表并将其保存到新的 .csv 文件中。

如何在 Spark 数据框中使用 when 和 Otherwise 语句根据布尔列？

当我们删除Spark管理的表时会发生什么？

使用Java Spark将嵌套数组展开为新列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论