英文:
pyspark - if statement inside select
问题
以下代码找到数据框 df
中所有列的最大长度。
问题:在下面的代码中,如何检查仅字符串列的最大长度?
from pyspark.sql.functions import col, length, max
df = df.select([max(length(col(name))) for name in df.schema.names])
英文:
Following code finds maximum length of all columns in dataframe df
.
Question: In the code below how can we check the max length of only string columns?
from pyspark.sql.functions import col, length, max
df=df.select([max(length(col(name))) for name in df.schema.names])
答案1
得分: 1
你可以添加一个条件来测试`df.schema`的`dataType`。例如:
```python
from pyspark.sql.types import StringType
df = spark.createDataFrame(
[
(1, '2', '1'),
(1, '4', '2'),
(1, '2', '3'),
],
['col1','col2','col3']
)
df.select([
max(length(col(schema.name))).alias(f'{schema.name}_max_length')
for schema in df.schema
if schema.dataType == StringType()
])
+---------------+---------------+
|col2_max_length|col3_max_length|
+---------------+---------------+
| 1| 1|
+---------------+---------------+
英文:
You can add a condition that tests for the dataType
of df.schema
. For example:
from pyspark.sql.types import StringType
df = spark.createDataFrame(
[
(1, '2', '1'),
(1, '4', '2'),
(1, '2', '3'),
],
['col1','col2','col3']
)
df.select([
max(length(col(schema.name))).alias(f'{schema.name}_max_length')
for schema in df.schema
if schema.dataType == StringType()
])
+---------------+---------------+
|col2_max_length|col3_max_length|
+---------------+---------------+
| 1| 1|
+---------------+---------------+
答案2
得分: 1
而不是使用 `schema.names`,您可以使用 `schema.fields`,它返回一个 StructField 列表,您可以遍历该列表并获取每个字段的名称和类型。
df.select([max(length(col(field.name))) for field in df.schema.fields if field.dataType.typeName == "string"])
<details>
<summary>英文:</summary>
Instead of using `schema.names`, you can use `schema.fields` that returns list of StructField’s which you can iterate through and get name and type of each field.
df.select([max(length(col(field.name))) for field in df.schema.fields if field.dataType.typeName == "string"])
</details>
# 答案3
**得分**: -1
```python
df = df.select([max(length(col(name))) for (name, type) in df.dtypes if type == 'string'])
英文:
df = df.select([max(length(col(name))) for (name, type) in df.dtypes if type == 'string'])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论