英文:
How to change a column value in the PySpark dataframe with a datatype of an array of structs
问题
如何在 PySpark 数据框中更改列值,列的数据类型为array
的structs
,例如,我想将long_value
除以10。
root
|-- properties: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: struct (nullable = true)
| | | |-- string_value: string (nullable = true)
| | | |-- long_value: long (nullable = true)
我尝试使用withColumn
方法来实现,但这种方法返回相同的数据框。
df.withColumn("properties.value.long_value", col("properties")[0]["value"]["long_value"] / 10 )
英文:
How to change a column value in the PySpark dataframe with a datatype of an array
of structs
, for example, I would like to divide long_value
by 10.
root
|-- properties: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: struct (nullable = true)
| | | |-- string_value: string (nullable = true)
| | | |-- long_value: long (nullable = true)
I tried to achieve it by using the withColumn
method but the approach returns the same df.
df.withColumn("properties.value.long_value", col("properties")[0]["value"]["long_value"] / 10 )
答案1
得分: 1
df1 = df.withColumn(
"properties",
F.transform(
"properties",
lambda x: x.withField("value", x["value"].withField("long_value", x["value"].getField("long_value") / 10))
)
)
英文:
Using transform
function along with column method withField
:
df1 = df.withColumn(
"properties",
F.transform(
"properties",
lambda x: x.withField("value", x["value"].withField("long_value", x["value"].getField("long_value") / 10))
)
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论