问题

如何在pyspark中将此流数据帧转换为以下形式：

+--------------------+------+---+---+
|           timestamp|offset|num|cor|
+--------------------+------+---+---+
|2023-03-03 17:21:...|    10| 55| 32|
+--------------------+------+---+---+
|2023-03-03 17:21:...|    10| 14| 54|
+--------------------+------+---+---+
|2023-03-03 17:35:...|    11| 55| 98|
+--------------------+------+---+---+
|2023-03-03 17:35:...|    11| 32| 77|
+--------------------+------+---+---+

请注意，这是您的数据帧从原始形式转换为所需形式的示例。

英文:

How to convert this streaming dataframe in pyspark,

+--------------------+------+----------------------------------------------+
|           timestamp|offset|stringdecode(value, UTF-8)                    |
+--------------------+------+----------------------------------------------+
|2023-03-03 17:21:...|    10| &quot;[{&quot;num&quot;:55,&quot;cor&quot;:32},{&quot;num&quot;:14,&quot;cor&quot;:54}]&quot;  |
+--------------------+------+----------------------------------------------+
|2023-03-03 17:35:...|    11| &quot;[{&quot;num&quot;:55,&quot;cor&quot;:98},{&quot;num&quot;:32,&quot;cor&quot;:77}]&quot;  |
+--------------------+------+----------------------------------------------+

into this

+--------------------+------+---+---+
|           timestamp|offset|num|cor|
+--------------------+------+---+---+
|2023-03-03 17:21:...|    10| 55| 32|
+--------------------+------+---+---+ 
|2023-03-03 17:21:...|    10| 14| 54|
+--------------------+------+---+---+ 
|2023-03-03 17:35:...|    11| 55| 98|
+--------------------+------+---+---+ 
|2023-03-03 17:35:...|    11| 32| 77|
+--------------------+------+---+---+

stackoverflow is asking me to add text to post my question, but i don't see any need for this, hence this paragraph to solve the issue

答案1

得分: 1

仅翻译代码部分：

Just use [from_json][1] and expand the column

This would work:

sch = ArrayType(StructType([
    StructField("num", IntegerType()),
    StructField("cor", IntegerType())
]))

df1.withColumn("asArray", F.from_json("dict", sch))\
    .withColumn("asStruct", F.explode("asArray"))\
    .select(*[col for col in df1.schema.names if col != "dict"], "asStruct.*")\
    .show()

如果您有任何问题，请告诉我。

英文:

Just use from_json and expand the column

This would work:

sch=ArrayType(StructType([
        StructField(&quot;num&quot;, IntegerType()),
        StructField(&quot;cor&quot;, IntegerType())
]))    

df1.withColumn(&quot;asArray&quot;, F.from_json(&quot;dict&quot;, sch))\
    .withColumn(&quot;asStruct&quot;, F.explode(&quot;asArray&quot;))\
    .select(*[col for col in df1.schema.names if col!=&quot;dict&quot;], &quot;asStruct.*&quot;)\
    .show()

Input:

+-------------------+------+-----------------------------------------+
|timestamp          |offset|dict                                     |
+-------------------+------+-----------------------------------------+
|2023-03-03 00:00:00|10    |[{&quot;num&quot;:55,&quot;cor&quot;:32},{&quot;num&quot;:14,&quot;cor&quot;:54}]|
+-------------------+------+-----------------------------------------+

Schema:

root
 |-- timestamp: string (nullable = true)
 |-- offset: string (nullable = true)
 |-- dict: string (nullable = true)

Output:

+-------------------+------+---+---+
|          timestamp|offset|num|cor|
+-------------------+------+---+---+
|2023-03-03 00:00:00|    10| 55| 32|
|2023-03-03 00:00:00|    10| 14| 54|
+-------------------+------+---+---+

Let me know if you face any issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将包含字典列表的Spark列拆分为字符串格式。

问题

答案1

没有从Delta表中返回数据，尽管Delta文件存在。

Pyspark的嵌套for循环的连接操作等效方法是什么？

Spark Scala Dataframe中的`case when`类似函数

使用Presto SQL或Scala中的JSON_EXTRACT或JSON_EXTRACT_SCALAR

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论