删除满足特定条件的行。

huangapple go评论122阅读模式
英文:

Remove rows if a certain condition occur

问题

我正在处理一个庞大的数据集。让我举个例子:

  1. df = data.frame(country = c("France", "France", "France", "France", "Italy", "Italy", "Italy", "Italy", "Spain", "Spain", "Spain", "Spain"), year = c(replicate(3, c(2000, 2001, 2002, 2003))), X = c(seq(1:12)))

如果(根据这个示例)2002年的X > 7,我将删除与特定国家相关的所有行。因此,根据这个条件,西班牙将消失。

英文:

I'm dealing with a massive dataset. Let me make an example

  1. df=data.frame(country = c("France","France","France","France","Italy","Italy","Italy","Italy","Spain","Spain","Spain","Spain"),year=c(replicate(3,c(2000,2001,2002,2003))),X=c(seq(1:12)))

I'd remove all the rows associated with a given country if (according to this example) X > 7 in 2002. As a result, Spain shall disappear

答案1

得分: 1

你可以使用 match 来筛选出那些在 2002 年时 X 值小于等于 7 的国家。

  1. library(dplyr)
  2. df %>% filter(X[match(2002, year)] <= 7, .by = country)
  3. # country year X
  4. #1 France 2000 1
  5. #2 France 2001 2
  6. #3 France 2002 3
  7. #4 France 2003 4
  8. #5 Italy 2000 5
  9. #6 Italy 2001 6
  10. #7 Italy 2002 7
  11. #8 Italy 2003 8
英文:

You may take help of match to keep those countries whose value of X is less than equal to 7 in the year 2002.

  1. library(dplyr)
  2. df %&gt;% filter(X[match(2002, year)] &lt;= 7, .by = country)
  3. # country year X
  4. #1 France 2000 1
  5. #2 France 2001 2
  6. #3 France 2002 3
  7. #4 France 2003 4
  8. #5 Italy 2000 5
  9. #6 Italy 2001 6
  10. #7 Italy 2002 7
  11. #8 Italy 2003 8

答案2

得分: 1

  1. # 创建数据框架
  2. df <- data.frame(
  3. country = c("法国","法国","法国","法国","意大利","意大利","意大利","意大利","西班牙","西班牙","西班牙","西班牙"),
  4. year = c(replicate(3, c(2000, 2001, 2002, 2003))),
  5. X = c(seq(1:12))
  6. )
  7. # 过滤数据框架,移除每个国家在2002年X大于7的行
  8. df_filtered <- df[!(df$year == 2002 & df$X > 7), ]
  9. # 打印过滤后的数据框架
  10. print(df_filtered)

请注意,我已经将代码中的国家名称从英文翻译成中文。

英文:
  1. # Create the data frame
  2. df &lt;- data.frame(
  3. country = c(&quot;France&quot;,&quot;France&quot;,&quot;France&quot;,&quot;France&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Italy&quot;,&quot;Spain&quot;,&quot;Spain&quot;,&quot;Spain&quot;,&quot;Spain&quot;),
  4. year = c(replicate(3, c(2000, 2001, 2002, 2003))),
  5. X = c(seq(1:12))
  6. )
  7. # Filter the data frame to remove rows where X &gt; 7 in 2002 for each country
  8. df_filtered &lt;- df[!(df$year == 2002 &amp; df$X &gt; 7), ]
  9. # Print the filtered data frame
  10. print(df_filtered)

答案3

得分: 0

  1. %let pgm = utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;
  2. 选择满足复合条件并进一步使用wps r python sql进行子集选择
  3. github
  4. https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql
  5. 解决方案
  6. 1. wps sql
  7. 2. wps r sql
  8. 3. wps python sql
  9. 我无法使发布的R解决方案中的任何一个起作用。
  10. https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur
  11. libname sd1 "d:/sd1";
  12. data sd1.have;informat
  13. COUNTRY $6.
  14. YEAR 8.
  15. X 8.
  16. ;input
  17. COUNTRY YEAR X;
  18. cards4;
  19. France 2000 1
  20. France 2001 2
  21. France 2002 3
  22. France 2003 4
  23. Italy 2000 5
  24. Italy 2001 6
  25. Italy 2002 7
  26. Italy 2003 8
  27. Spain 2000 9
  28. Spain 2001 10
  29. Spain 2002 11
  30. Spain 2003 12
  31. ;;;;
  32. run;quit;
  33. /**************************************************************************************************************************/
  34. /* | | */
  35. /* | | */
  36. /* SD1.HAVE total obs=12 | PROCESS | OUTPUT */
  37. /* | | */
  38. /* bs COUNTRY YEAR X | | COUNTRY YEAR X */
  39. /* | | --------------------------- */
  40. /* 1 France 2000 1 | 删除西班牙后 | France 2000 1 */
  41. /* 2 France 2001 2 | | France 2001 2 */
  42. /* 3 France 2002 3 | 选择不满足条件的行 | France 2002 3 有 2002 x≤7 */
  43. /* 4 France 2003 4 | | France 2003 4 */
  44. /* | 不是 (X > 7 和 year = 2002) | */
  45. /* 5 Italy 2000 5 | | Italy 2000 5 */
  46. /* 6 Italy 2001 6 | | Italy 2001 6 */
  47. /* 7 Italy 2002 7 | | Italy 2002 7 有 2002 x≤7 */
  48. /* 8 Italy 2003 8 | | Italy 2003 8 */
  49. /* | | */
  50. /* 9 Spain 2000 9 | 移除西班牙,因为它没有 | 保留 X=8 因为不是 2002 且 x > 7. */
  51. /* 10 Spain 2001 10 | | 只需 2003 就可以决定 */
  52. /* 11 Spain 2002 11 | 至少有一个 ( year = 2002 且 x≤7) | */
  53. /* 12 Spain 2003 12 | | */
  54. /* | | */
  55. /**************************************************************************************************************************/
  56. /* _
  57. / | __ ___ __ ___ ___ __ _| |
  58. | | \ \ /\ / / `_ \/ __| / __|/ _` | |
  59. | | \ V V /| |_) \__ \ \__ \ (_| | |
  60. |_| \_/\_/ | .__/ \__,_|\__|
  61. |_|
  62. */
  63. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  64. %utl_submit_wps64x('
  65. libname sd1 "d:/sd1";
  66. options validvarname=any;
  67. proc sql;
  68. create
  69. table sd1.want as
  70. select
  71. l.country
  72. ,l.year
  73. ,l.x
  74. from
  75. sd1.have as l, (
  76. select
  77. country
  78. from
  79. sd1.have
  80. having
  81. ( year = 2002 and x≤7)
  82. ) as r
  83. where
  84. l.country = r.country
  85. and not ( l.year = 2002 and l.x > 7)
  86. ;quit;
  87. proc print data=sd1.want;
  88. run;quit;
  89. ');
  90. /* _ _
  91. ___ _ _| |_ _ __ _ _| |_
  92. / _ \| | | | __| `_ \| | | | __|
  93. | (_) | |_| | |_| |_) | |_| | |_
  94. \___/ \__,_|\__| .__/ \__,_|\__|
  95. |_|
  96. */
  97. /**************************************************************************************************************************/
  98. /* | */
  99. /* The WPS System | The inn select results in */
  100. /* | */
  101. /* Obs COUNTRY YEAR X | COUNTRY */
  102. /* | */
  103. /* 1 France 2003 4 | France */
  104. /* 2 France 2002 3 | Italy */
  105. /* 3 France 2001 2 | */
  106. /* 4 France 2000 1 | Spain 被删除,因为它没有至少一个 */
  107. /* | */
  108. /* 5 Italy 2003 8 | year = 2002 且 x≤7 */
  109. /* 6 Italy 2002 7 | */
  110. /* 7 Italy 2001 6 | 外部选择仅进行最终过滤 */
  111. /* 8 Italy 2000 5 | */
  112. /* | */
  113. /**************************************************************************************************************************/
  114. /*___ _
  115. |___ \ __ ___ __ ___ _ __ ___ __ _| |
  116. __) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
  117. / __/ \ V V /| |_) \__ \ | | \__ \ (_| | |
  118. |_____| \_/\_/ | .__/|___/ |_| |___/\__, |_|
  119. |_| |_|
  120. */
  121. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  122. %utl_submit_wps64x('
  123. libname sd1 "d:/sd1";
  124. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  125. proc r;
  126. export data=sd1
  127. <details>
  128. <summary>英文:</summary>
  129. %let pgm =utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql;
  130. Select groups of rows having a compound condition and further subset using wps r python sql
  131. github
  132. https://github.com/rogerjdeangelis/utl-select-groups-of-rows-having-a-compound-condition-and-further-subset-using-wps-r-python-sql
  133. Solutions
  134. 1 wps sql
  135. 2 wps r sql
  136. 3 wps python sql
  137. I could not get any of the posted R solutions to work.
  138. https://stackoverflow.com/questions/76876788/remove-rows-if-a-certain-condition-occur
  139. /* _
  140. (_)_ __ _ __ _ _| |_
  141. | | `_ \| `_ \| | | | __|
  142. | | | | | |_) | |_| | |_
  143. |_|_| |_| .__/ \__,_|\__|
  144. |_|
  145. */
  146. libname sd1 &quot;d:/sd1&quot;;
  147. data sd1.have;informat
  148. COUNTRY $6.
  149. YEAR 8.
  150. X 8.
  151. ;input
  152. COUNTRY YEAR X;
  153. cards4;
  154. France 2000 1
  155. France 2001 2
  156. France 2002 3
  157. France 2003 4
  158. Italy 2000 5
  159. Italy 2001 6
  160. Italy 2002 7
  161. Italy 2003 8
  162. Spain 2000 9
  163. Spain 2001 10
  164. Spain 2002 11
  165. Spain 2003 12
  166. ;;;;
  167. run;quit;
  168. /**************************************************************************************************************************/
  169. /* | | */
  170. /* | | */
  171. /* SD1.HAVE total obs=12 | PROCESS | OUTPUT */
  172. /* | | */
  173. /* bs COUNTRY YEAR X | | COUNTRY YEAR X */
  174. /* | | --------------------------- */
  175. /* 1 France 2000 1 | After removing Spain | France 2000 1 */
  176. /* 2 France 2001 2 | | France 2001 2 */
  177. /* 3 France 2002 3 | select rows that are not | France 2002 3 Has 2002 x&lt;=7 */
  178. /* 4 France 2003 4 | | France 2003 4 */
  179. /* | not (X &gt; 7 and year = 2002) | */
  180. /* 5 Italy 2000 5 | | Italy 2000 5 */
  181. /* 6 Italy 2001 6 | | Italy 2001 6 */
  182. /* 7 Italy 2002 7 | | Italy 2002 7 Has 2002 x&lt;=7 */
  183. /* 8 Italy 2003 8 | | Italy 2003 8 */
  184. /* | | */
  185. /* 9 Spain 2000 9 | Remove SPAIN because it does not have | Keep X=8 because NOT 2002 and x &gt; 7. */
  186. /* 10 Spain 2001 10 | | 2003 is enough to decide */
  187. /* 11 Spain 2002 11 | zt lease one ( year = 2002 and x&lt;=7) | */
  188. /* 12 Spain 2003 12 | | */
  189. /* | | */
  190. /**************************************************************************************************************************/
  191. /* _
  192. / | __ ___ __ ___ ___ __ _| |
  193. | | \ \ /\ / / `_ \/ __| / __|/ _` | |
  194. | | \ V V /| |_) \__ \ \__ \ (_| | |
  195. |_| \_/\_/ | .__/|___/ |___/\__, |_|
  196. |_| |_|
  197. */
  198. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  199. %utl_submit_wps64x(&#39;
  200. libname sd1 &quot;d:/sd1&quot;;
  201. options validvarname=any;
  202. proc sql;
  203. create
  204. table sd1.want as
  205. select
  206. l.country
  207. ,l.year
  208. ,l.x
  209. from
  210. sd1.have as l, (
  211. select
  212. country
  213. from
  214. sd1.have
  215. having
  216. ( year = 2002 and x&lt;=7)
  217. ) as r
  218. where
  219. l.country = r.country
  220. and not ( l.year = 2002 and l.x &gt; 7)
  221. ;quit;
  222. proc print data=sd1.want;
  223. run;quit;
  224. &#39;);
  225. /* _ _
  226. ___ _ _| |_ _ __ _ _| |_
  227. / _ \| | | | __| `_ \| | | | __|
  228. | (_) | |_| | |_| |_) | |_| | |_
  229. \___/ \__,_|\__| .__/ \__,_|\__|
  230. |_|
  231. */
  232. /**************************************************************************************************************************/
  233. /* | */
  234. /* The WPS System | The inn select results in */
  235. /* | */
  236. /* Obs COUNTRY YEAR X | COUNTRY */
  237. /* | */
  238. /* 1 France 2003 4 | France */
  239. /* 2 France 2002 3 | Italy */
  240. /* 3 France 2001 2 | */
  241. /* 4 France 2000 1 | Spain is dropped because it does not have at least one */
  242. /* | */
  243. /* 5 Italy 2003 8 | year = 2002 and x&lt;=7 */
  244. /* 6 Italy 2002 7 | */
  245. /* 7 Italy 2001 6 | The outer select just does the final filtering */
  246. /* 8 Italy 2000 5 | */
  247. /* | */
  248. /**************************************************************************************************************************/
  249. /*___ _
  250. |___ \ __ ___ __ ___ _ __ ___ __ _| |
  251. __) | \ \ /\ / / `_ \/ __| | `__| / __|/ _` | |
  252. / __/ \ V V /| |_) \__ \ | | \__ \ (_| | |
  253. |_____| \_/\_/ | .__/|___/ |_| |___/\__, |_|
  254. |_| |_|
  255. */
  256. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  257. %utl_submit_wps64x(&#39;
  258. libname sd1 &quot;d:/sd1&quot;;
  259. proc r;
  260. export data=sd1.have r=have;
  261. submit;
  262. library(sqldf);
  263. want &lt;- sqldf(&quot;
  264. select
  265. l.country
  266. ,l.year
  267. ,l.x
  268. from
  269. have as l, (
  270. select
  271. max(country) as country
  272. from
  273. have
  274. group
  275. by country
  276. having
  277. max( year = 2002 and x&lt;=7)
  278. ) as r
  279. where
  280. l.country = r.country
  281. and ( l.year = 2002 and l.x &gt; 7) = 0
  282. &quot;);
  283. want;
  284. endsubmit;
  285. run;quit;
  286. &#39;);
  287. /*____ _ _ _
  288. |___ / __ ___ __ ___ _ __ _ _| |_| |__ ___ _ __ ___ __ _| |
  289. |_ \ \ \ /\ / / `_ \/ __| | `_ \| | | | __| `_ \ / _ \| `_ \ / __|/ _` | |
  290. ___) | \ V V /| |_) \__ \ | |_) | |_| | |_| | | | (_) | | | | \__ \ (_| | |
  291. |____/ \_/\_/ | .__/|___/ | .__/ \__, |\__|_| |_|\___/|_| |_| |___/\__, |_|
  292. |_| |_| |___/ |_|
  293. */
  294. %utl_submit_wps64x(&#39;
  295. libname sd1 &quot;d:/sd1&quot;;
  296. proc datasets lib=sd1 nolist nodetails;delete want; run;quit;
  297. proc python;
  298. export data=sd1.have python=have;
  299. submit;
  300. from os import path;
  301. import pandas as pd;
  302. import numpy as np;
  303. import pandas as pd;
  304. from pandasql import sqldf;
  305. mysql = lambda q: sqldf(q, globals());
  306. from pandasql import PandaSQL;
  307. pdsql = PandaSQL(persist=True);
  308. sqlite3conn = next(pdsql.conn.gen).connection.connection;
  309. sqlite3conn.enable_load_extension(True);
  310. sqlite3conn.load_extension(&quot;c:/temp/libsqlitefunctions.dll&quot;);
  311. mysql = lambda q: sqldf(q, globals());
  312. want=pdsql(&quot;&quot;&quot;
  313. select
  314. l.country
  315. ,l.year
  316. ,l.x
  317. from
  318. have as l, (
  319. select
  320. max(country) as country
  321. from
  322. have
  323. group
  324. by country
  325. having
  326. max( year = 2002 and x&lt;=7) = 1
  327. ) as r
  328. where
  329. l.country = r.country
  330. and ( l.year = 2002 and l.x &gt; 7) = 0
  331. &quot;&quot;&quot;);
  332. print(want);
  333. endsubmit;
  334. import data=sd1.want python=want;
  335. run;quit;
  336. proc print data=sd1.want;
  337. run;quit;
  338. &#39;);
  339. /* _
  340. ___ _ __ __| |
  341. / _ \ `_ \ / _` |
  342. | __/ | | | (_| |
  343. \___|_| |_|\__,_|
  344. */
  345. </details>

huangapple
  • 本文由 发表于 2023年8月10日 22:39:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876788.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定