英文:
Is there alternative functions in PySpark as countAll,countAllDistinct in Azure data Flow
问题
在PySpark中,只有count()和countDistinct()两种方法。那么如何在PySpark中实现countAll()和CountAllDistinct()呢?
英文:
I have count-->counts null exclusive
countAll---> counts null inclusive
Similarly countDistinct ,countAllDistinct in Azure Data Flow Aggregation transformation.
But in pySpark,there are only count() and countDistinct().So how to achieve countAll() and CountAllDistict() in PySpark.
答案1
得分: 0
正如您所知,对于CountAll
和CountDistinctAll
,没有内置函数可供使用,您可以按照以下方式解决:
要计算PySpark DataFrame中的所有行数,包括空值,您可以直接使用count()
函数,而无需使用when()
函数。
如果您有一个Pyspark数据框:
from pyspark.sql.functions import count, col, sum, countDistinct, when
# 计算所有行,不包括空值
df.select(count("column name")).show()
# 计算所有行,包括空值
df.select(sum(when(col("column name").isNull() | col("column name").isNotNull(), 1).otherwise(0))).show()
# 计算所有不同的行,不包括空值
df.select(countDistinct("column name")).show()
# 计算所有不同的行,包括空值
df.select("column name").distinct().count()
执行和输出:
如果您有一个Pyspark数据框:
-- 计算所有行,包括空值
SELECT COUNT(*) AS total_count FROM (SELECT value FROM sampleView);
-- 计算所有不同的行,包括空值
SELECT COUNT(*) AS distinct_total_count FROM (SELECT DISTINCT value FROM sampleView);
执行和输出:
英文:
As you know there is no Built-in functions for CountAll
and CountDistinctAll
to work around it you can follow below:
To count all rows in a PySpark DataFrame, including null values, you can use the count()
function directly without using the when()
function.
If you have dataframe in pyspark:
from pyspark.sql.functions import count, col, sum, countDistinct, when
#count all rows excluding null
df.select(count("column name")).show()
#count all rows including null
df.select(sum(when(col("column name").isNull() | col("column name").isNotNull(), 1).otherwise(0))).show()
#count all distinct rows excluding null
df.select(countDistinct("column name")).show()
#count all distinct rows including null
df.select("column name").distinct().count()
Execution and OUTPUT:
If you have dataframe in pyspark:
#count all rows including null
SELECT COUNT(*) AS total_count FROM (Select value from sampleView);
#count all distinct rows including null
SELECT COUNT(*) AS distinct_total_count FROM (Select distinct value from sampleView);
Execution and OUTPUT:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论