英文:
Dropping a column based on the value of another pyspark
问题
I am trying to build and a method which takes a dataframe as input and returns the another dataframe as output after checking a certain condition.
If the exception_type
column contains FILE_REJECT
, it drops
the file_name
column, otherwise it does not.
I have provided various input and output to the method. Please help me build the method. Thank you.
# 以下是您要翻译的内容
输入
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAW0001| SUSPENSE|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0006| REC_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| | |
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| | |
|KKAR0523.ccn| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0002| FILE_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
输出
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAW0001| SUSPENSE|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0006| REC_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| | |
+------------+---------------+-------------+----------------+---------+--------------+--------------+
+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| | |
| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+
+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0002| FILE_REJECT|
| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+
谢谢。
英文:
I am trying to build and a method which takes a dataframe as input and returns the another dataframe as output after checking a certain condition.
If the exception_type
column contains FILE_REJECT
, it drops
the file_name
column, otherwise it does not.
I have provided various input and output to the method. Please help me build the method. Thank you.
Input
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAW0001| SUSPENSE|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0006| REC_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| | |
+------------+---------------+-------------+----------------+---------+--------------+--------------+
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| | |
|KKAR0523.ccn| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0002| FILE_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
Output
+------------+---------------+-------------+----------------+---------+--------------+--------------+
| file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAW0001| SUSPENSE|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0006| REC_REJECT|
|KKAR0523.ccn| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| | |
+------------+---------------+-------------+----------------+---------+--------------+--------------+
+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| | |
| zzzzzzzz| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+
+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0002| FILE_REJECT|
| xxxxxxxx| yyyyyyyy| 2001-12-22| SHR| CHAR0001| FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+
Thank you.
答案1
得分: 1
您可以使用以下方法来实现:
from pyspark.sql.functions import col
def process_dataframe(input_df):
if 'file_name' in input_df.columns and 'exception_type' in input_df.columns:
if input_df.filter(col('exception_type') == 'FILE_REJECT').count() > 0:
output_df = input_df.drop('file_name')
else:
output_df = input_df
else:
output_df = input_df
return output_df
BR
英文:
You can achieve that using such method:
from pyspark.sql.functions import col
def process_dataframe(input_df):
if 'file_name' in input_df.columns and 'exception_type' in input_df.columns:
if input_df.filter(col('exception_type') == 'FILE_REJECT').count() > 0:
output_df = input_df.drop('file_name')
else:
output_df = input_df
else:
output_df = input_df
return output_df
BR
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论