pyspark – 在select语句内的if语句

huangapple go评论74阅读模式
英文:

pyspark - if statement inside select

问题

以下代码找到数据框 df 中所有列的最大长度。

问题:在下面的代码中,如何检查仅字符串列的最大长度?

from pyspark.sql.functions import col, length, max

df = df.select([max(length(col(name))) for name in df.schema.names])
英文:

Following code finds maximum length of all columns in dataframe df.

Question: In the code below how can we check the max length of only string columns?

from pyspark.sql.functions import col, length, max

df=df.select([max(length(col(name))) for name in df.schema.names])

答案1

得分: 1

你可以添加一个条件来测试`df.schema``dataType`。例如

```python
from pyspark.sql.types import StringType

df = spark.createDataFrame(
    [
        (1, '2', '1'),
        (1, '4', '2'),
        (1, '2', '3'),
    ],
    ['col1','col2','col3']
)

df.select([
    max(length(col(schema.name))).alias(f'{schema.name}_max_length') 
    for schema in df.schema 
    if schema.dataType == StringType()
])
    
+---------------+---------------+
|col2_max_length|col3_max_length|
+---------------+---------------+
|              1|              1|
+---------------+---------------+
英文:

You can add a condition that tests for the dataType of df.schema. For example:

from pyspark.sql.types import StringType

df = spark.createDataFrame(
    [
        (1, '2', '1'),
        (1, '4', '2'),
        (1, '2', '3'),
    ],
    ['col1','col2','col3']
)

df.select([
    max(length(col(schema.name))).alias(f'{schema.name}_max_length') 
    for schema in df.schema 
    if schema.dataType == StringType()
])
    
+---------------+---------------+
|col2_max_length|col3_max_length|
+---------------+---------------+
|              1|              1|
+---------------+---------------+

答案2

得分: 1

而不是使用 `schema.names`,您可以使用 `schema.fields`,它返回一个 StructField 列表,您可以遍历该列表并获取每个字段的名称和类型。

df.select([max(length(col(field.name))) for field in df.schema.fields if field.dataType.typeName == "string"])


<details>
<summary>英文:</summary>

Instead of using `schema.names`, you can use `schema.fields` that returns list of StructField’s which you can iterate through and get name and type of each field.

df.select([max(length(col(field.name))) for field in df.schema.fields if field.dataType.typeName == "string"])


</details>



# 答案3
**得分**: -1

```python
df = df.select([max(length(col(name))) for (name, type) in df.dtypes if type == 'string'])
英文:
df = df.select([max(length(col(name))) for (name, type) in df.dtypes if type == &#39;string&#39;])

huangapple
  • 本文由 发表于 2023年2月18日 00:43:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定