删除基于另一个pyspark的值的列。

huangapple go评论68阅读模式
英文:

Dropping a column based on the value of another pyspark

问题

I am trying to build and a method which takes a dataframe as input and returns the another dataframe as output after checking a certain condition.

If the exception_type column contains FILE_REJECT, it drops the file_name column, otherwise it does not.

I have provided various input and output to the method. Please help me build the method. Thank you.

# 以下是您要翻译的内容
输入
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAW0001|      SUSPENSE|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0006|    REC_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|              |              |

+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|              |              |
|KKAR0523.ccn|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+

+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0002|   FILE_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
输出
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAW0001|      SUSPENSE|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0006|    REC_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|              |              |
+------------+---------------+-------------+----------------+---------+--------------+--------------+

+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|              |              |
|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+

+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0002|   FILE_REJECT|
|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+

谢谢。

英文:

I am trying to build and a method which takes a dataframe as input and returns the another dataframe as output after checking a certain condition.

If the exception_type column contains FILE_REJECT, it drops the file_name column, otherwise it does not.

I have provided various input and output to the method. Please help me build the method. Thank you.

Input
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAW0001|      SUSPENSE|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0006|    REC_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|              |              |
+------------+---------------+-------------+----------------+---------+--------------+--------------+

+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|              |              |
|KKAR0523.ccn|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+

+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0002|   FILE_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
Output
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|   file_name|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+------------+---------------+-------------+----------------+---------+--------------+--------------+
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAW0001|      SUSPENSE|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0006|    REC_REJECT|
|KKAR0523.ccn|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|              |              |
+------------+---------------+-------------+----------------+---------+--------------+--------------+

+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|              |              |
|       zzzzzzzz|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+

+---------------+-------------+----------------+---------+--------------+--------------+
|register_number|jacket_number|annual_return_dt|data_type|exception_code|exception_type|
+---------------+-------------+----------------+---------+--------------+--------------+
|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0002|   FILE_REJECT|
|       xxxxxxxx|     yyyyyyyy|      2001-12-22|      SHR|      CHAR0001|   FILE_REJECT|
+---------------+-------------+----------------+---------+--------------+--------------+

Thank you.

答案1

得分: 1

您可以使用以下方法来实现:

from pyspark.sql.functions import col

def process_dataframe(input_df):
    if 'file_name' in input_df.columns and 'exception_type' in input_df.columns:
        if input_df.filter(col('exception_type') == 'FILE_REJECT').count() > 0:
            output_df = input_df.drop('file_name')
        else:
            output_df = input_df
    else:
        output_df = input_df
    
    return output_df

BR

英文:

You can achieve that using such method:

from pyspark.sql.functions import col

def process_dataframe(input_df):
    if 'file_name' in input_df.columns and 'exception_type' in input_df.columns:
        if input_df.filter(col('exception_type') == 'FILE_REJECT').count() > 0:
            output_df = input_df.drop('file_name')
        else:
            output_df = input_df
    else:
        output_df = input_df
    
    return output_df

BR

huangapple
  • 本文由 发表于 2023年5月22日 20:18:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306118.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定