Pyspark:添加具有行计数的单个值的行/列

huangapple go评论68阅读模式
英文:

Pyspark: Adding row/column with single value of row counts

问题

行数 col1 col2 col3 ... col13
numofrows string string timest ... int
string string timest ... int
英文:

I have a pyspark dataframe that I'd like to get the row count for. Once I get the row count, I'd like to add it to the top left corner of the data frame, as shown below.

I've tried creating the row first and doing a union on the empty row and the dataframe, but the empty row gets overwritten. I've tried adding it as a literal in a column, but having trouble nulling the remainder of the column as well as the row. Any advice?

dataframe:

col1 col2 col3 ... col13
string string timest ... int

for a few rows.

desired output:

row_count col1 col2 col3 ... col13
numofrows
string string timest ... int

So the row count would sit where an otherwise empty row and empty column meet.

答案1

得分: 0

假设 `df` 是你的数据框

```python
from pyspark.sql import functions as F

cnt = df.count()

columns_list = df.columns

df = df.withColumn("row_count", F.lit(None).cast("int"))
schema = df.schema

cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)

df.unionAll(cnt_line).show()

<details>
<summary>英文:</summary>

Assuming `df` is your dataframe:
```python
from pyspark.sql import functions as F

cnt = df.count()

columns_list = df.columns

df = df.withColumn(&quot;row_count&quot;, F.lit(None).cast(&quot;int&quot;))
schema = df.schema

cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)

df.unionAll(cnt_line).show()

huangapple
  • 本文由 发表于 2023年5月11日 11:25:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223956.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定