删除满足特定条件的行。

huangapple go评论95阅读模式
英文:

Remove rows if a certain condition occur

问题

我正在处理一个庞大的数据集。让我举个例子:

df = data.frame(country = c("France", "France", "France", "France", "Italy", "Italy", "Italy", "Italy", "Spain", "Spain", "Spain", "Spain"), year = c(replicate(3, c(2000, 2001, 2002, 2003))), X = c(seq(1:12)))

如果(根据这个示例)2002年的X > 7,我将删除与特定国家相关的所有行。因此,根据这个条件,西班牙将消失。

英文:

I'm dealing with a massive dataset. Let me make an example

df=data.frame(country = c("France","France","France","France","Italy","Italy","Italy","Italy","Spain","Spain","Spain","Spain"),year=c(replicate(3,c(2000,2001,2002,2003))),X=c(seq(1:12)))

I'd remove all the rows associated with a given country if (according to this example) X > 7 in 2002. As a result, Spain shall disappear

答案1

得分: 1

你可以使用 match 来筛选出那些在 2002 年时 X 值小于等于 7 的国家。

library(dplyr)

df %>% filter(X[match(2002, year)] <= 7, .by = country)

#  country year X
#1  France 2000 1
#2  France 2001 2
#3  France 2002 3
#4  France 2003 4
#5   Italy 2000 5
#6   Italy 2001 6
#7   Italy 2002 7
#8   Italy 2003 8
英文:

You may take help of match to keep those countries whose value of X is less than equal to 7 in the year 2002.

library(dplyr)

df %&gt;% filter(X[match(2002, year)] &lt;= 7, .by = country)

#  country year X
#1  France 2000 1
#2  France 2001 2
#3  France 2002 3
#4  France 2003 4
#5   Italy 2000 5
#6   Italy 2001 6
#7   Italy 2002 7
#8   Italy 2003 8

答案2

得分: 1

# 创建数据框架
df <- data.frame(
  country = c("法国","法国","法国","法国","意大利","意大利","意大利","意大利","西班牙","西班牙","西班牙","西班牙"),
  year = c(replicate(3, c(2000, 2001, 2002, 2003))),
  X = c(seq(1:12))
)

# 过滤数据框架,移除每个国家在2002年X大于7的行
df_filtered <- df[!(df$year == 2002 & df$X > 7), ]

# 打印过滤后的数据框架
print(df_filtered)

请注意,我已经将代码中的国家名称从英文翻译成中文。

英文:
# Create the data frame
df &lt;- data.frame(
  country = c(&quot;France&quot;,&quot;France&quot;,&quot;France&quot;,&quot;France&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Spain&quot;,&quot;Spain&quot;,&quot;Spain&quot;,&quot;Spain&quot;),
  year = c(replicate(3, c(2000, 2001, 2002, 2003))),
  X = c(seq(1:12))
)

# Filter the data frame to remove rows where X &gt; 7 in 2002 for each country
df_filtered &lt;- df[!(df$year == 2002 &amp; df$X &gt; 7), ]

# Print the filtered data frame
print(df_filtered)

答案3

得分: 0

%let pgm = utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;

选择满足复合条件并进一步使用wps r python sql进行子集选择

github
https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql

解决方案

1. wps sql
2. wps r sql
3. wps python sql

我无法使发布的R解决方案中的任何一个起作用。

https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur

libname sd1 "d:/sd1";

data sd1.have;informat
COUNTRY $6.
YEAR 8.
X 8.
;input
COUNTRY YEAR X;
cards4;
France 2000 1
France 2001 2
France 2002 3
France 2003 4
Italy 2000 5
Italy 2001 6
Italy 2002 7
Italy 2003 8
Spain 2000 9
Spain 2001 10
Spain 2002 11
Spain 2003 12
;;;;
run;quit;

/**************************************************************************************************************************/
/*                               |                                        |                                               */
/*                               |                                        |                                               */
/* SD1.HAVE total obs=12         |  PROCESS                               | OUTPUT                                        */
/*                               |                                        |                                               */
/* bs    COUNTRY    YEAR     X   |                                        |   COUNTRY      YEAR         X                 */
/*                               |                                        |   ---------------------------                 */
/*  1    France     2000     1   |  删除西班牙后                           |   France       2000         1                 */
/*  2    France     2001     2   |                                        |   France       2001         2                 */
/*  3    France     2002     3   |  选择不满足条件的行                     |   France       2002         3 有 2002 x≤7       */
/*  4    France     2003     4   |                                        |   France       2003         4                 */
/*                               |  不是 (X > 7 和  year = 2002)           |                                               */
/*  5    Italy      2000     5   |                                        |   Italy        2000         5                 */
/*  6    Italy      2001     6   |                                        |   Italy        2001         6                 */
/*  7    Italy      2002     7   |                                        |   Italy        2002         7 有 2002 x≤7      */
/*  8    Italy      2003     8   |                                        |   Italy        2003         8                 */
/*                               |                                        |                                               */
/*  9    Spain      2000     9   |  移除西班牙,因为它没有               |   保留 X=8 因为不是 2002 且 x > 7.             */
/* 10    Spain      2001    10   |                                        |   只需 2003 就可以决定                       */
/* 11    Spain      2002    11   |  至少有一个 ( year = 2002 且 x≤7)       |                                               */
/* 12    Spain      2003    12   |                                        |                                               */
/*                               |                                        |                                               */
/**************************************************************************************************************************/

/*                                  _
/ | __      ___ __  ___   ___  __ _| |
| | \ \ /\ / / `_ \/ __| / __|/ _` | |
| |  \ V  V /| |_) \__ \ \__ \ (_| | |
|_|   \_/\_/ | .__/ \__,_|\__|
              |_|

*/

proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

%utl_submit_wps64x('

libname sd1 "d:/sd1";

options validvarname=any;

proc sql;
  create
     table sd1.want as
  select
     l.country
    ,l.year
    ,l.x
  from
    sd1.have as l, (
      select
         country
      from
         sd1.have
      having
        ( year = 2002 and x≤7)
      ) as r
  where
          l.country = r.country
      and not ( l.year = 2002 and l.x > 7)
;quit;
proc print data=sd1.want;
run;quit;

');

/*           _               _
  ___  _   _| |_ _ __  _   _| |_
 / _ \| | | | __| `_ \| | | | __|
| (_) | |_| | |_| |_) | |_| | |_
 \___/ \__,_|\__| .__/ \__,_|\__|
                |_|
*/

/**************************************************************************************************************************/
/*                               |                                                                                        */
/* The WPS System                |  The inn select results in                                                             */
/*                               |                                                                                        */
/* Obs    COUNTRY    YEAR    X   |   COUNTRY                                                                              */
/*                               |                                                                                        */
/*  1     France     2003    4   |   France                                                                               */
/*  2     France     2002    3   |   Italy                                                                                */
/*  3     France     2001    2   |                                                                                        */
/*  4     France     2000    1   |   Spain 被删除,因为它没有至少一个                                                               */
/*                               |                                                                                        */
/*  5     Italy      2003    8   |   year = 2002 且 x≤7                                                                     */
/*  6     Italy      2002    7   |                                                                                        */
/*  7     Italy      2001    6   |   外部选择仅进行最终过滤                                                               */
/*  8     Italy      2000    5   |                                                                                        */
/*                               |                                                                                        */
/**************************************************************************************************************************/

/*___                                          _
|___ \  __      ___ __  ___   _ __   ___  __ _| |
  __) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
 / __/   \ V  V /| |_) \__ \ | |    \__ \ (_| | |
|_____|   \_/\_/ | .__/|___/ |_|    |___/\__, |_|
                 |_|                 |_|

*/

proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

%utl_submit_wps64x('

libname sd1 "d:/sd1";
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

proc r;
export data=sd1

<details>
<summary>英文:</summary>

    %let pgm =utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;

    Select groups of rows having a compound condition and further subset using wps r python sql

    github
    https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql

      Solutions

         1 wps sql
         2 wps r sql
         3 wps python sql

    I could not get any of the posted R solutions to work.

    https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur

    /*                   _
    (_)_ __  _ __  _   _| |_
    | | `_ \| `_ \| | | | __|
    | | | | | |_) | |_| | |_
    |_|_| |_| .__/ \__,_|\__|
            |_|
    */

    libname sd1 &quot;d:/sd1&quot;;

    data sd1.have;informat
    COUNTRY $6.
    YEAR 8.
    X 8.
    ;input
    COUNTRY YEAR X;
    cards4;
    France 2000 1
    France 2001 2
    France 2002 3
    France 2003 4
    Italy 2000 5
    Italy 2001 6
    Italy 2002 7
    Italy 2003 8
    Spain 2000 9
    Spain 2001 10
    Spain 2002 11
    Spain 2003 12
    ;;;;
    run;quit;

    /**************************************************************************************************************************/
    /*                               |                                        |                                               */
    /*                               |                                        |                                               */
    /* SD1.HAVE total obs=12         |  PROCESS                               | OUTPUT                                        */
    /*                               |                                        |                                               */
    /* bs    COUNTRY    YEAR     X   |                                        |   COUNTRY      YEAR         X                 */
    /*                               |                                        |   ---------------------------                 */
    /*  1    France     2000     1   |  After removing Spain                  |   France       2000         1                 */
    /*  2    France     2001     2   |                                        |   France       2001         2                 */
    /*  3    France     2002     3   |  select rows that are not              |   France       2002         3 Has 2002 x&lt;=7   */
    /*  4    France     2003     4   |                                        |   France       2003         4                 */
    /*                               |  not (X &gt; 7 and  year = 2002)          |                                               */
    /*  5    Italy      2000     5   |                                        |   Italy        2000         5                 */
    /*  6    Italy      2001     6   |                                        |   Italy        2001         6                 */
    /*  7    Italy      2002     7   |                                        |   Italy        2002         7  Has 2002 x&lt;=7  */
    /*  8    Italy      2003     8   |                                        |   Italy        2003         8                 */
    /*                               |                                        |                                               */
    /*  9    Spain      2000     9   |  Remove SPAIN because it does not have |   Keep X=8 because NOT 2002 and x &gt; 7.        */
    /* 10    Spain      2001    10   |                                        |   2003 is enough to decide                    */
    /* 11    Spain      2002    11   |  zt lease one ( year = 2002 and x&lt;=7)  |                                               */
    /* 12    Spain      2003    12   |                                        |                                               */
    /*                               |                                        |                                               */
    /**************************************************************************************************************************/

    /*                                  _
    / | __      ___ __  ___   ___  __ _| |
    | | \ \ /\ / / `_ \/ __| / __|/ _` | |
    | |  \ V  V /| |_) \__ \ \__ \ (_| | |
    |_|   \_/\_/ | .__/|___/ |___/\__, |_|
                 |_|                 |_|
    */
    proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

    %utl_submit_wps64x(&#39;

    libname sd1 &quot;d:/sd1&quot;;

    options validvarname=any;

    proc sql;
      create
         table sd1.want as
      select
         l.country
        ,l.year
        ,l.x
      from
        sd1.have as l, (
          select
             country
          from
             sd1.have
          having
            ( year = 2002 and x&lt;=7)
          ) as r
      where
              l.country = r.country
          and not ( l.year = 2002 and l.x &gt; 7)
    ;quit;
    proc print data=sd1.want;
    run;quit;

    &#39;);

    /*           _               _
      ___  _   _| |_ _ __  _   _| |_
     / _ \| | | | __| `_ \| | | | __|
    | (_) | |_| | |_| |_) | |_| | |_
     \___/ \__,_|\__| .__/ \__,_|\__|
                    |_|
    */

    /**************************************************************************************************************************/
    /*                               |                                                                                        */
    /* The WPS System                |  The inn select results in                                                             */
    /*                               |                                                                                        */
    /* Obs    COUNTRY    YEAR    X   |   COUNTRY                                                                              */
    /*                               |                                                                                        */
    /*  1     France     2003    4   |   France                                                                               */
    /*  2     France     2002    3   |   Italy                                                                                */
    /*  3     France     2001    2   |                                                                                        */
    /*  4     France     2000    1   |   Spain is dropped  because it does not have at least one                              */
    /*                               |                                                                                        */
    /*  5     Italy      2003    8   |   year = 2002 and x&lt;=7                                                                 */
    /*  6     Italy      2002    7   |                                                                                        */
    /*  7     Italy      2001    6   |   The outer select just does the final filtering                                       */
    /*  8     Italy      2000    5   |                                                                                        */
    /*                               |                                                                                        */
    /**************************************************************************************************************************/

    /*___                                          _
    |___ \  __      ___ __  ___   _ __   ___  __ _| |
      __) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
     / __/   \ V  V /| |_) \__ \ | |    \__ \ (_| | |
    |_____|   \_/\_/ | .__/|___/ |_|    |___/\__, |_|
                     |_|                        |_|
    */

    proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

    %utl_submit_wps64x(&#39;

    libname sd1 &quot;d:/sd1&quot;;

    proc r;
    export data=sd1.have r=have;
    submit;
    library(sqldf);
    want &lt;- sqldf(&quot;
      select
         l.country
        ,l.year
        ,l.x
      from
        have as l, (
          select
             max(country) as country
          from
             have
          group
             by country
          having
            max( year = 2002 and x&lt;=7)
          ) as r
      where
              l.country = r.country
          and  ( l.year = 2002 and l.x &gt; 7) = 0
      &quot;);
    want;
    endsubmit;
    run;quit;
    &#39;);


    /*____                                    _   _                             _
    |___ /  __      ___ __  ___   _ __  _   _| |_| |__   ___  _ __    ___  __ _| |
      |_ \  \ \ /\ / / `_ \/ __| | `_ \| | | | __| `_ \ / _ \| `_ \  / __|/ _` | |
     ___) |  \ V  V /| |_) \__ \ | |_) | |_| | |_| | | | (_) | | | | \__ \ (_| | |
    |____/    \_/\_/ | .__/|___/ | .__/ \__, |\__|_| |_|\___/|_| |_| |___/\__, |_|
                     |_|         |_|    |___/                                |_|
    */

    %utl_submit_wps64x(&#39;

    libname sd1 &quot;d:/sd1&quot;;
    proc datasets lib=sd1 nolist nodetails;delete want; run;quit;

    proc python;
    export data=sd1.have python=have;
    submit;
     from os import path;
     import pandas as pd;
     import numpy as np;
     import pandas as pd;
     from pandasql import sqldf;
     mysql = lambda q: sqldf(q, globals());
     from pandasql import PandaSQL;
     pdsql = PandaSQL(persist=True);
     sqlite3conn = next(pdsql.conn.gen).connection.connection;
     sqlite3conn.enable_load_extension(True);
     sqlite3conn.load_extension(&quot;c:/temp/libsqlitefunctions.dll&quot;);
     mysql = lambda q: sqldf(q, globals());
     want=pdsql(&quot;&quot;&quot;
      select
         l.country
        ,l.year
        ,l.x
      from
        have as l, (
          select
             max(country) as country
          from
             have
          group
             by country
          having
            max( year = 2002 and x&lt;=7) = 1
          ) as r
      where
              l.country = r.country
          and  ( l.year = 2002 and l.x &gt; 7) = 0
     &quot;&quot;&quot;);
    print(want);
    endsubmit;
    import data=sd1.want python=want;
    run;quit;
    proc print data=sd1.want;
    run;quit;
    &#39;);

    /*              _
      ___ _ __   __| |
     / _ \ `_ \ / _` |
    |  __/ | | | (_| |
     \___|_| |_|\__,_|

    */


</details>



huangapple
  • 本文由 发表于 2023年8月10日 22:39:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876788.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定