英文:
Pyspark: Adding row/column with single value of row counts
问题
行数 | col1 | col2 | col3 | ... | col13 |
---|---|---|---|---|---|
numofrows | string | string | timest | ... | int |
string | string | timest | ... | int |
英文:
I have a pyspark dataframe that I'd like to get the row count for. Once I get the row count, I'd like to add it to the top left corner of the data frame, as shown below.
I've tried creating the row first and doing a union on the empty row and the dataframe, but the empty row gets overwritten. I've tried adding it as a literal in a column, but having trouble nulling the remainder of the column as well as the row. Any advice?
dataframe:
col1 | col2 | col3 | ... | col13 |
---|---|---|---|---|
string | string | timest | ... | int |
for a few rows.
desired output:
row_count | col1 | col2 | col3 | ... | col13 |
---|---|---|---|---|---|
numofrows | |||||
string | string | timest | ... | int |
So the row count would sit where an otherwise empty row and empty column meet.
答案1
得分: 0
假设 `df` 是你的数据框:
```python
from pyspark.sql import functions as F
cnt = df.count()
columns_list = df.columns
df = df.withColumn("row_count", F.lit(None).cast("int"))
schema = df.schema
cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)
df.unionAll(cnt_line).show()
<details>
<summary>英文:</summary>
Assuming `df` is your dataframe:
```python
from pyspark.sql import functions as F
cnt = df.count()
columns_list = df.columns
df = df.withColumn("row_count", F.lit(None).cast("int"))
schema = df.schema
cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)
df.unionAll(cnt_line).show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论