英文:
Remove rows if a certain condition occur
问题
我正在处理一个庞大的数据集。让我举个例子:
df = data.frame(country = c("France", "France", "France", "France", "Italy", "Italy", "Italy", "Italy", "Spain", "Spain", "Spain", "Spain"), year = c(replicate(3, c(2000, 2001, 2002, 2003))), X = c(seq(1:12)))
如果(根据这个示例)2002年的X > 7,我将删除与特定国家相关的所有行。因此,根据这个条件,西班牙将消失。
英文:
I'm dealing with a massive dataset. Let me make an example
df=data.frame(country = c("France","France","France","France","Italy","Italy","Italy","Italy","Spain","Spain","Spain","Spain"),year=c(replicate(3,c(2000,2001,2002,2003))),X=c(seq(1:12)))
I'd remove all the rows associated with a given country if (according to this example) X > 7 in 2002. As a result, Spain shall disappear
答案1
得分: 1
你可以使用 match
来筛选出那些在 2002 年时 X
值小于等于 7 的国家。
library(dplyr)
df %>% filter(X[match(2002, year)] <= 7, .by = country)
# country year X
#1 France 2000 1
#2 France 2001 2
#3 France 2002 3
#4 France 2003 4
#5 Italy 2000 5
#6 Italy 2001 6
#7 Italy 2002 7
#8 Italy 2003 8
英文:
You may take help of match
to keep those countries whose value of X
is less than equal to 7 in the year 2002.
library(dplyr)
df %>% filter(X[match(2002, year)] <= 7, .by = country)
# country year X
#1 France 2000 1
#2 France 2001 2
#3 France 2002 3
#4 France 2003 4
#5 Italy 2000 5
#6 Italy 2001 6
#7 Italy 2002 7
#8 Italy 2003 8
答案2
得分: 1
# 创建数据框架
df <- data.frame(
country = c("法国","法国","法国","法国","意大利","意大利","意大利","意大利","西班牙","西班牙","西班牙","西班牙"),
year = c(replicate(3, c(2000, 2001, 2002, 2003))),
X = c(seq(1:12))
)
# 过滤数据框架,移除每个国家在2002年X大于7的行
df_filtered <- df[!(df$year == 2002 & df$X > 7), ]
# 打印过滤后的数据框架
print(df_filtered)
请注意,我已经将代码中的国家名称从英文翻译成中文。
英文:
# Create the data frame
df <- data.frame(
country = c("France","France","France","France","Italy","Italy","Italy","Italy","Spain","Spain","Spain","Spain"),
year = c(replicate(3, c(2000, 2001, 2002, 2003))),
X = c(seq(1:12))
)
# Filter the data frame to remove rows where X > 7 in 2002 for each country
df_filtered <- df[!(df$year == 2002 & df$X > 7), ]
# Print the filtered data frame
print(df_filtered)
答案3
得分: 0
%let pgm = utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;
选择满足复合条件并进一步使用wps r python sql进行子集选择
github
https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql
解决方案
1. wps sql
2. wps r sql
3. wps python sql
我无法使发布的R解决方案中的任何一个起作用。
https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur
libname sd1 "d:/sd1";
data sd1.have;informat
COUNTRY $6.
YEAR 8.
X 8.
;input
COUNTRY YEAR X;
cards4;
France 2000 1
France 2001 2
France 2002 3
France 2003 4
Italy 2000 5
Italy 2001 6
Italy 2002 7
Italy 2003 8
Spain 2000 9
Spain 2001 10
Spain 2002 11
Spain 2003 12
;;;;
run;quit;
/**************************************************************************************************************************/
/* | | */
/* | | */
/* SD1.HAVE total obs=12 | PROCESS | OUTPUT */
/* | | */
/* bs COUNTRY YEAR X | | COUNTRY YEAR X */
/* | | --------------------------- */
/* 1 France 2000 1 | 删除西班牙后 | France 2000 1 */
/* 2 France 2001 2 | | France 2001 2 */
/* 3 France 2002 3 | 选择不满足条件的行 | France 2002 3 有 2002 x≤7 */
/* 4 France 2003 4 | | France 2003 4 */
/* | 不是 (X > 7 和 year = 2002) | */
/* 5 Italy 2000 5 | | Italy 2000 5 */
/* 6 Italy 2001 6 | | Italy 2001 6 */
/* 7 Italy 2002 7 | | Italy 2002 7 有 2002 x≤7 */
/* 8 Italy 2003 8 | | Italy 2003 8 */
/* | | */
/* 9 Spain 2000 9 | 移除西班牙,因为它没有 | 保留 X=8 因为不是 2002 且 x > 7. */
/* 10 Spain 2001 10 | | 只需 2003 就可以决定 */
/* 11 Spain 2002 11 | 至少有一个 ( year = 2002 且 x≤7) | */
/* 12 Spain 2003 12 | | */
/* | | */
/**************************************************************************************************************************/
/* _
/ | __ ___ __ ___ ___ __ _| |
| | \ \ /\ / / `_ \/ __| / __|/ _` | |
| | \ V V /| |_) \__ \ \__ \ (_| | |
|_| \_/\_/ | .__/ \__,_|\__|
|_|
*/
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
%utl_submit_wps64x('
libname sd1 "d:/sd1";
options validvarname=any;
proc sql;
create
table sd1.want as
select
l.country
,l.year
,l.x
from
sd1.have as l, (
select
country
from
sd1.have
having
( year = 2002 and x≤7)
) as r
where
l.country = r.country
and not ( l.year = 2002 and l.x > 7)
;quit;
proc print data=sd1.want;
run;quit;
');
/* _ _
___ _ _| |_ _ __ _ _| |_
/ _ \| | | | __| `_ \| | | | __|
| (_) | |_| | |_| |_) | |_| | |_
\___/ \__,_|\__| .__/ \__,_|\__|
|_|
*/
/**************************************************************************************************************************/
/* | */
/* The WPS System | The inn select results in */
/* | */
/* Obs COUNTRY YEAR X | COUNTRY */
/* | */
/* 1 France 2003 4 | France */
/* 2 France 2002 3 | Italy */
/* 3 France 2001 2 | */
/* 4 France 2000 1 | Spain 被删除,因为它没有至少一个 */
/* | */
/* 5 Italy 2003 8 | year = 2002 且 x≤7 */
/* 6 Italy 2002 7 | */
/* 7 Italy 2001 6 | 外部选择仅进行最终过滤 */
/* 8 Italy 2000 5 | */
/* | */
/**************************************************************************************************************************/
/*___ _
|___ \ __ ___ __ ___ _ __ ___ __ _| |
__) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
/ __/ \ V V /| |_) \__ \ | | \__ \ (_| | |
|_____| \_/\_/ | .__/|___/ |_| |___/\__, |_|
|_| |_|
*/
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
%utl_submit_wps64x('
libname sd1 "d:/sd1";
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
proc r;
export data=sd1
<details>
<summary>英文:</summary>
%let pgm =utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;
Select groups of rows having a compound condition and further subset using wps r python sql
github
https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql
Solutions
1 wps sql
2 wps r sql
3 wps python sql
I could not get any of the posted R solutions to work.
https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur
/* _
(_)_ __ _ __ _ _| |_
| | `_ \| `_ \| | | | __|
| | | | | |_) | |_| | |_
|_|_| |_| .__/ \__,_|\__|
|_|
*/
libname sd1 "d:/sd1";
data sd1.have;informat
COUNTRY $6.
YEAR 8.
X 8.
;input
COUNTRY YEAR X;
cards4;
France 2000 1
France 2001 2
France 2002 3
France 2003 4
Italy 2000 5
Italy 2001 6
Italy 2002 7
Italy 2003 8
Spain 2000 9
Spain 2001 10
Spain 2002 11
Spain 2003 12
;;;;
run;quit;
/**************************************************************************************************************************/
/* | | */
/* | | */
/* SD1.HAVE total obs=12 | PROCESS | OUTPUT */
/* | | */
/* bs COUNTRY YEAR X | | COUNTRY YEAR X */
/* | | --------------------------- */
/* 1 France 2000 1 | After removing Spain | France 2000 1 */
/* 2 France 2001 2 | | France 2001 2 */
/* 3 France 2002 3 | select rows that are not | France 2002 3 Has 2002 x<=7 */
/* 4 France 2003 4 | | France 2003 4 */
/* | not (X > 7 and year = 2002) | */
/* 5 Italy 2000 5 | | Italy 2000 5 */
/* 6 Italy 2001 6 | | Italy 2001 6 */
/* 7 Italy 2002 7 | | Italy 2002 7 Has 2002 x<=7 */
/* 8 Italy 2003 8 | | Italy 2003 8 */
/* | | */
/* 9 Spain 2000 9 | Remove SPAIN because it does not have | Keep X=8 because NOT 2002 and x > 7. */
/* 10 Spain 2001 10 | | 2003 is enough to decide */
/* 11 Spain 2002 11 | zt lease one ( year = 2002 and x<=7) | */
/* 12 Spain 2003 12 | | */
/* | | */
/**************************************************************************************************************************/
/* _
/ | __ ___ __ ___ ___ __ _| |
| | \ \ /\ / / `_ \/ __| / __|/ _` | |
| | \ V V /| |_) \__ \ \__ \ (_| | |
|_| \_/\_/ | .__/|___/ |___/\__, |_|
|_| |_|
*/
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
%utl_submit_wps64x('
libname sd1 "d:/sd1";
options validvarname=any;
proc sql;
create
table sd1.want as
select
l.country
,l.year
,l.x
from
sd1.have as l, (
select
country
from
sd1.have
having
( year = 2002 and x<=7)
) as r
where
l.country = r.country
and not ( l.year = 2002 and l.x > 7)
;quit;
proc print data=sd1.want;
run;quit;
');
/* _ _
___ _ _| |_ _ __ _ _| |_
/ _ \| | | | __| `_ \| | | | __|
| (_) | |_| | |_| |_) | |_| | |_
\___/ \__,_|\__| .__/ \__,_|\__|
|_|
*/
/**************************************************************************************************************************/
/* | */
/* The WPS System | The inn select results in */
/* | */
/* Obs COUNTRY YEAR X | COUNTRY */
/* | */
/* 1 France 2003 4 | France */
/* 2 France 2002 3 | Italy */
/* 3 France 2001 2 | */
/* 4 France 2000 1 | Spain is dropped because it does not have at least one */
/* | */
/* 5 Italy 2003 8 | year = 2002 and x<=7 */
/* 6 Italy 2002 7 | */
/* 7 Italy 2001 6 | The outer select just does the final filtering */
/* 8 Italy 2000 5 | */
/* | */
/**************************************************************************************************************************/
/*___ _
|___ \ __ ___ __ ___ _ __ ___ __ _| |
__) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
/ __/ \ V V /| |_) \__ \ | | \__ \ (_| | |
|_____| \_/\_/ | .__/|___/ |_| |___/\__, |_|
|_| |_|
*/
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
%utl_submit_wps64x('
libname sd1 "d:/sd1";
proc r;
export data=sd1.have r=have;
submit;
library(sqldf);
want <- sqldf("
select
l.country
,l.year
,l.x
from
have as l, (
select
max(country) as country
from
have
group
by country
having
max( year = 2002 and x<=7)
) as r
where
l.country = r.country
and ( l.year = 2002 and l.x > 7) = 0
");
want;
endsubmit;
run;quit;
');
/*____ _ _ _
|___ / __ ___ __ ___ _ __ _ _| |_| |__ ___ _ __ ___ __ _| |
|_ \ \ \ /\ / / `_ \/ __| | `_ \| | | | __| `_ \ / _ \| `_ \ / __|/ _` | |
___) | \ V V /| |_) \__ \ | |_) | |_| | |_| | | | (_) | | | | \__ \ (_| | |
|____/ \_/\_/ | .__/|___/ | .__/ \__, |\__|_| |_|\___/|_| |_| |___/\__, |_|
|_| |_| |___/ |_|
*/
%utl_submit_wps64x('
libname sd1 "d:/sd1";
proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
proc python;
export data=sd1.have python=have;
submit;
from os import path;
import pandas as pd;
import numpy as np;
import pandas as pd;
from pandasql import sqldf;
mysql = lambda q: sqldf(q, globals());
from pandasql import PandaSQL;
pdsql = PandaSQL(persist=True);
sqlite3conn = next(pdsql.conn.gen).connection.connection;
sqlite3conn.enable_load_extension(True);
sqlite3conn.load_extension("c:/temp/libsqlitefunctions.dll");
mysql = lambda q: sqldf(q, globals());
want=pdsql("""
select
l.country
,l.year
,l.x
from
have as l, (
select
max(country) as country
from
have
group
by country
having
max( year = 2002 and x<=7) = 1
) as r
where
l.country = r.country
and ( l.year = 2002 and l.x > 7) = 0
""");
print(want);
endsubmit;
import data=sd1.want python=want;
run;quit;
proc print data=sd1.want;
run;quit;
');
/* _
___ _ __ __| |
/ _ \ `_ \ / _` |
| __/ | | | (_| |
\___|_| |_|\__,_|
*/
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论