尝试对已经被标记化的数据框对象进行子集和/或筛选(基本上是分离)。

huangapple go评论62阅读模式
英文:

Trying to subset and/or filter (essentially separate) a data frame object which has already been tokenized

问题

尝试对已经进行了标记的数据框对象进行子集和/或筛选(基本上是分开),以便数据框的每一行都以参议员和他们说的话、参议员和他们提出的法案或国会记录的标题开头。

我需要做的是保留识别出参议员和他们的讲话的行。当我将子集或筛选函数应用于我的数据框时,输出没有内容(请参见脚本的描述)。

#这是具有正则表达式的参议员名单
## "^"表示名字在字符串的开头
## "\\s"表示名字后应该有一个空格
## "(.*)"表示应该包括所有后续内容
sen_names <- c("^McCONNELL先生(R; Kentucky):\\s(.*)",&quot;^JOHNSON先生(D; South Dakota):\\s(.*)",&quot;^NICKLES先生(R; Oklahoma):\\s(.*)",&quot;^LEAHY先生(D; Vermont):\\s(.*)",&quot;^CONRAD先生(D; North Dakota):\\s(.*)",&quot;^BOXER女士(D; California):\\s(.*)",&quot;^BOXER女士(D; California):\\s(.*)",&quot;^SHELBY先生(R; Alabama):\\s(.*)",&quot;^MURKOWSKI女士(R; Alaska):\\s(.*)",&quot;^STEVENS先生(R; Alaska):\\s(.*)",&quot;^KYL先生(R; Arizona):\\s(.*)",&quot;^McCAIN先生(R; Arizona):\\s(.*)",&quot;^PRYOR先生(D; Arkansas):\\s(.*)",&quot;^LINCOLN女士(D; Arkansas):\\s(.*)",&quot;^FEINSTEIN女士(D; California):\\s(.*)",&quot;^ALLARD先生(R; Colorado):\\s(.*)",&quot;^CAMPBELL先生(R; Colorado):\\s(.*)",&quot;^DODD先生(D; Connecticut):\\s(.*)",&quot;^LIEBERMAN先生(D; Connecticut):\\s(.*)",&quot;^BIDEN先生(D; Delaware):\\s(.*)",&quot;^CARPER先生(D; Delaware):\\s(.*)",&quot;^GRAHAM L先生(R; South Carolina):\\s(.*)",&quot;^GRAHAM B先生(D; Florida):\\s(.*)",&quot;^NELSON C先生(D; Florida):\\s(.*)",&quot;^NELSON B先生(D; Nebraska):\\s(.*)",&quot;^CHAMBLISS先生(R; Georgia):\\s(.*)",&quot;^MILLER先生(D; Georgia):\\s(.*)",&quot;^AKAKA先生(D; Hawaii):\\s(.*)",&quot;^INOUYE先生(D; Hawaii):\\s(.*)",&quot;^CRAIG先生(R; Idaho):\\s(.*)",&quot;^CRAPO先生(R; Idaho):\\s(.*)",&quot;^DURBIN先生(D; Illinois):\\s(.*)",&quot;^FITZGERALD先生(R; Illinois):\\s(.*)",&quot;^BAYH先生(D; Indiana):\\s(.*)",&quot;^LUGAR先生(R; Indiana):\\s(.*)",&quot;^GRASSLEY先生(R; Iowa):\\s(.*)",&quot;^HARKIN先生(D; Iowa):\\s(.*)",&quot;^BROWNBACK先生(R; Kansas):\\s(.*)",&quot;^ROBERTS先生(R; Kansas):\\s(.*)",&quot;^BUNNING先生(R; Kentucky):\\s(.*)",&quot;^BREAUX先生(D; Louisiana):\\s(.*)",&quot;^LANDRIEU女士(D; Louisiana):\\s(.*)",&quot;^COLLINS女士(R; Maine):\\s(.*)",&quot;^SNOWE女士(R; Maine):\\s(.*)",&quot;^MIKULSKI女士(D; Maryland):\\s(.*)",&quot;^SARBANES先生(D; Maryland):\\s(.*)",&quot;^KENNEDY先生(D; Massachusetts):\\s(.*)",&quot;^KERRY先生(D; Massachusetts):\\s(.*)",&quot;^STABENOW女士(D; Michigan):\\s(.*)",&quot;^LEVIN先生(D; Michigan):\\s(.*)",&quot;^DAYTON先生(D; Minnesota):\\s(.*)",&quot;^COLEMAN先生(R; Minnesota):\\s(.*)",&quot;^COCHRAN先生(R; Mississippi):\\s(.*)",&quot;^COCHRAN先生(R; Mississippi):\\s(.*)",&quot;^TALENT先生(R; Missouri):\\s(.*)",&quot;^BOND先生(R; Missouri):\\s(.*)",&quot;^BAUCUS先生(D; Montana):\\s(.*)",&quot;^BURNS先生(R; Montana):\\s(.*)",&quot;^HAGEL先生(R; Nebraska):\\s(.*)",&quot;^ENSIGN先生(R; Nevada):\\s(.*)",&quot;^REID先生(D; Nevada):\\s(.*)",&quot;^GREGG先生(R; New Hampshire):\\s(.*)",&quot;^SUNUNU先生(R; New Hampshire):\\s(.*)",&quot;^CORZINE先生(D; New Jersey):\\s(.*)",&quot;^LAUTENBERG先生(D; New Jersey):\\s(.*)",&quot;^BINGAMAN先生(D; New Mexico):\\s(.*)",&quot;^DOMENICI先生(R; New Mexico):\\s(.*)",&quot;^CLINTON女士(D; New York):\\s(.*)",&quot;^SCHUMER先生(D; New York):\\s(.*)",&quot;^SCHUMER先生(D; New York):\\s(.*)",&quot;^SCHUMER先生(D; New York):\\s(.*)",&quot;^DORGAN先生(D; North Dakota):\\s(.*)",&quot;^DEWINE先生(

<details>
<summary>英文:</summary>

Trying to subset and/or filter (essentially separate) a data frame object which has already been tokenized so that each row of the df either begins with a Senator and what they said, a Senator and the bill they’re proposing, and a header of the Congressional record.

What I need to do is keep the rows that identify a senator and their speech. When I apply the subset or filter function to my df, the output has no content (see script for description).


library(tidytext)
library(dplyr)
library(stringr)

#this is the list of senator names with regex

"^" to indicate that the names are at the begginning of the string

"\s" to indicate that there should be a white space after the ":"

"(.*) to indicate that it should include all following content

sen_names <- c("^Mr. McCONNELL (R; Kentucky):\s(.)","^Mr. JOHNSON (D; South Dakota):\s(.)","^Mr. NICKLES (R; Oklahoma):\s(.)","^Mr. LEAHY (D; Vermont):\s(.)","^Mr. CONRAD (D; North Dakota):\s(.)","^Mrs. BOXER (D; California):\s(.)","^Mrs. BOXER (D; California):\s(.)","^Mr. SHELBY (R:\s(.) Alabama):\s(.)","^Ms. MURKOWSKI (R; Alaska):\s(.)","^Mr. STEVENS (R; Alaska):\s(.)","^Mr. KYL (R; Arizona):\s(.)","^Mr. McCAIN (R; Arizona):\s(.)","^Mr. PRYOR (D; Arkansas):\s(.)","^Mrs. LINCOLN (D; Arkansas):\s(.)","^Mrs. FEINSTEIN (D; California):\s(.)","^Mr. ALLARD (R; Colorado):\s(.)","^Mr. CAMPBELL (R; Colorado):\s(.)","^Mr. DODD (D; Connecticut):\s(.)","^Mr. LIEBERMAN (D; Connecticut):\s(.)","^Mr. BIDEN (D; Delaware):\s(.)","^Mr. CARPER (D; Delaware):\s(.)","^Mr. L GRAHAM (R; South Carolina):\s(.)","^Mr. B GRAHAM (D; Florida):\s(.)","^Mr. C NELSON (D; Florida):\s(.)","^Mr. B NELSON (D; Nebraska):\s(.)","^Mr. CHAMBLISS (R; Georgia):\s(.)","^Mr. MILLER (D; Georgia):\s(.)","^Mr. AKAKA (D; Hawaii):\s(.)","^Mr. INOUYE (D; Hawaii):\s(.)","^Mr. CRAIG (R; Idaho):\s(.)","^Mr. CRAPO (R; Idaho):\s(.)","^Mr. DURBIN (D; Illinois):\s(.)","^Mr. FITZGERALD (R; Illinois):\s(.)","^Mr. BAYH (D; Indiana):\s(.)","^Mr. LUGAR (R; Indiana):\s(.)","^Mr. GRASSLEY (R; Iowa):\s(.)","^Mr. HARKIN (D; Iowa):\s(.)","^Mr. BROWNBACK (R; Kansas):\s(.)","^Mr. ROBERTS (R; Kansas):\s(.)","^Mr. BUNNING (R; Kentucky):\s(.)","^Mr. BREAUX (D; Louisiana):\s(.)","^Ms. LANDRIEU (D; Louisiana):\s(.)","^Ms. COLLINS (R; Maine):\s(.)","^Ms. SNOWE (R; Maine):\s(.)","^Ms. MIKULSKI (D; Maryland):\s(.)","^Mr. SARBANES (D; Maryland):\s(.)","^Mr. KENNEDY (D; Massachusetts):\s(.)","^Mr. KERRY (D; Massachusetts):\s(.)","^Ms. STABENOW (D; Michigan):\s(.)","^Mr. LEVIN (D; Michigan):\s(.)","^Mr. DAYTON (D; Minnesota):\s(.)","^Mr. COLEMAN (R; Minnesota):\s(.)","^Mr. COCHRAN (R; Mississippi):\s(.)","^Mr. COCHRAN (R; Mississippi):\s(.)","^Mr. TALENT (R; Missouri):\s(.)","^Mr. BOND (R; Missouri):\s(.)","^Mr. BAUCUS (D; Montana):\s(.)","^Mr. BURNS (R; Montana):\s(.)","^Mr. HAGEL (R; Nebraska):\s(.)","^Mr. ENSIGN (R; Nevada):\s(.)","^Mr. REID (D; Nevada):\s(.)","^Mr. GREGG (R; New Hampshire):\s(.)","^Mr. SUNUNU (R; New Hampshire):\s(.)","^Mr. CORZINE (D; New Jersey):\s(.)","^Mr. LAUTENBERG (D; New Jersey):\s(.)","^Mr. BINGAMAN (D; New Mexico):\s(.)","^Mr. DOMENICI (R; New Mexico):\s(.)","^Mrs. CLINTON (D; New York):\s(.)","^Mr. SCHUMER (D; New York):\s(.)","^Mr. SCHUMER (D; New York):\s(.)","^Mr. SCHUMER (D; New York):\s(.)","^Mr. DORGAN (D; North Dakota):\s(.)","^Mr. DEWINE (R; Ohio):\s(.)","^Mr. VOINOVICH (R; Ohio):\s(.)","^Mr. INHOFE (R; Oklahoma):\s(.)","^Mr. SMITH (R; Oregon):\s(.)","^Mr. WYDEN (D; Oregon):\s(.)","^Mr. SANTORUM (R; Pennsylvania):\s(.)","^Mr. SPECTER (R; Pennsylvania):\s(.)","^Mr. CHAFEE (R; Rhode Island):\s(.)","^Mr. REED (D; Rhode Island):\s(.)","^Mr. HOLLINGS (D; South Carolina):\s(.)","^Mr. DASCHLE (D; South Dakota):\s(.)","^Mr. FRIST (R; Tennessee):\s(.)","^Mr. ALEXANDER (R; Tennessee):\s(.)","^Mr. CORNYN (R; Texas):\s(.)","^Mrs. HUTCHISON (R; Texas):\s(.)","^Mr. BENNETT (R; Utah):\s(.)","^Mr. HATCH (R; Utah):\s(.)","^Mr. JEFFORDS (I; Vermont):\s(.)","^Mr. ALLEN (R; Virginia):\s(.)","^Mr. WARNER (R; Virginia):\s(.)","^Ms. CANTWELL (D; Washington):\s(.)","^Ms. CANTWELL (D; Washington):\s(.)","^Mr. BYRD (D; West Virginia):\s(.)","^Mr. ROCKEFELLER (D; West Virginia):\s(.)","^Mr. FEINGOLD (D; Wisconsin):\s(.)","^Mr. KOHL (D; Wisconsin):\s(.)","^Mr. ENZI (R; Wyoming):\s(.)","^Mr. THOMAS (R; Wyoming):\s(.*)")

#this imports the column from a tokenized dataframe
##the tokenized df is a single congressional record
##and it has already been split, I just need the ones that...
#... start with the names and are speeches rather than descriptions...
#... of legislation
aug_01_2003_speeches <- data.frame(aug_01_2003_tkns$cleaned)
colnames(aug_01_2003_speeches) <- c("speeches")
View(aug_01_2003_speeches)

#tried running through subset()
##OUTPUT -> "<0 rows> (or 0-length row.names)"
#NOTE: original object names were aug_o1_speeches
#... changed here for clarity when calling the object
aug_01_2003_x <- subset(aug_01_2003_speeches, sen_names, select = speeches)
aug_01_2003_x <- subset(aug_01_2003_speeches, subset = sen_names, select = speeches)
aug_01_2003_x <- subset(aug_01_2003_speeches, speeches %in% sen_names)
aug_01_2003_x

#tried running through filter()
##OUTPUT -> "<0 rows> (or 0-length row.names)"
aug_01_2003_x <- filter(aug_01_2003_speeches, speeches %in% sen_names, .preserve = T)
aug_01_2003_x

#there are 100 names here, so running each one individually...
#... wouldn't be the best solution in the world



I have uploaded a test environment to Google Drive that includes the list/dictionary of senators and one of the tokenized dfs. This is already setup with regex (and maybe this is where I’m going wrong?) It is accessible here: [test environment](https://drive.google.com/file/d/1mTZuu281JPDrEIwZlo_rgaRtV7X5EOMu/view?usp=share_link)




</details>


# 答案1
**得分**: 1

以下是翻译好的代码部分:

```R
# 可以使用 `dplyr` 和 `stringr` 包来实现这个目标:
library(dplyr)
library(stringr)

# 清理 sen_names 中的州、党派和正则表达式
sen_names <- c(
  "^Mr. McCONNELL (R; Kentucky):\\s(.*)&quot;",
  "^Mr. JOHNSON (D; South Dakota):\\s(.*)&quot;",
  # ... 还有其他的州、党派...
  "^Mr. ENZI (R; Wyoming):\\s(.*)&quot;",
  "^Mr. THOMAS (R; Wyoming):\\s(.*)&quot;"
)
sen_names <- gsub("\\(.*", "", sen_names)
sen_names <- trimws(sen_names)

# 添加 "或" 运算符 | 到 sen_names
sen_names <- paste0("(", paste(sen_names, collapse = "|"), ")")

# 使用 sen_names 创建新的过滤对象
speeches_only <- aug_01_2003_speeches %>%
  filter(str_detect(speeches, sen_names))

# 结果
data.frame(substr(speeches_only[1:10,1], 1, 50))

在你提供的示例数据集中,总共找到了 134 条符合你的条件的演讲。

英文:

This can be achieved using the dplyr and stringr packages:

library(dplyr)
library(stringr)

# Clean state, affiliation, and trailing regex from sen_names
sen_names &lt;- c(&quot;^Mr. McCONNELL (R; Kentucky):\\s(.*)&quot;,&quot;^Mr. JOHNSON (D; South Dakota):\\s(.*)&quot;,&quot;^Mr. NICKLES (R; Oklahoma):\\s(.*)&quot;,&quot;^Mr. LEAHY (D; Vermont):\\s(.*)&quot;,&quot;^Mr. CONRAD (D; North Dakota):\\s(.*)&quot;,&quot;^Mrs. BOXER (D; California):\\s(.*)&quot;,&quot;^Mrs. BOXER (D; California):\\s(.*)&quot;,&quot;^Mr. SHELBY (R:\\s(.*) Alabama):\\s(.*)&quot;,&quot;^Ms. MURKOWSKI (R; Alaska):\\s(.*)&quot;,&quot;^Mr. STEVENS (R; Alaska):\\s(.*)&quot;,&quot;^Mr. KYL (R; Arizona):\\s(.*)&quot;,&quot;^Mr. McCAIN (R; Arizona):\\s(.*)&quot;,&quot;^Mr. PRYOR (D; Arkansas):\\s(.*)&quot;,&quot;^Mrs. LINCOLN (D; Arkansas):\\s(.*)&quot;,&quot;^Mrs. FEINSTEIN (D; California):\\s(.*)&quot;,&quot;^Mr. ALLARD (R; Colorado):\\s(.*)&quot;,&quot;^Mr. CAMPBELL (R; Colorado):\\s(.*)&quot;,&quot;^Mr. DODD (D; Connecticut):\\s(.*)&quot;,&quot;^Mr. LIEBERMAN (D; Connecticut):\\s(.*)&quot;,&quot;^Mr. BIDEN (D; Delaware):\\s(.*)&quot;,&quot;^Mr. CARPER (D; Delaware):\\s(.*)&quot;,&quot;^Mr. L GRAHAM (R; South Carolina):\\s(.*)&quot;,&quot;^Mr. B GRAHAM (D; Florida):\\s(.*)&quot;,&quot;^Mr. C NELSON (D; Florida):\\s(.*)&quot;,&quot;^Mr. B NELSON (D; Nebraska):\\s(.*)&quot;,&quot;^Mr. CHAMBLISS (R; Georgia):\\s(.*)&quot;,&quot;^Mr. MILLER (D; Georgia):\\s(.*)&quot;,&quot;^Mr. AKAKA (D; Hawaii):\\s(.*)&quot;,&quot;^Mr. INOUYE (D; Hawaii):\\s(.*)&quot;,&quot;^Mr. CRAIG (R; Idaho):\\s(.*)&quot;,&quot;^Mr. CRAPO (R; Idaho):\\s(.*)&quot;,&quot;^Mr. DURBIN (D; Illinois):\\s(.*)&quot;,&quot;^Mr. FITZGERALD (R; Illinois):\\s(.*)&quot;,&quot;^Mr. BAYH (D; Indiana):\\s(.*)&quot;,&quot;^Mr. LUGAR (R; Indiana):\\s(.*)&quot;,&quot;^Mr. GRASSLEY (R; Iowa):\\s(.*)&quot;,&quot;^Mr. HARKIN (D; Iowa):\\s(.*)&quot;,&quot;^Mr. BROWNBACK (R; Kansas):\\s(.*)&quot;,&quot;^Mr. ROBERTS (R; Kansas):\\s(.*)&quot;,&quot;^Mr. BUNNING (R; Kentucky):\\s(.*)&quot;,&quot;^Mr. BREAUX (D; Louisiana):\\s(.*)&quot;,&quot;^Ms. LANDRIEU (D; Louisiana):\\s(.*)&quot;,&quot;^Ms. COLLINS (R; Maine):\\s(.*)&quot;,&quot;^Ms. SNOWE (R; Maine):\\s(.*)&quot;,&quot;^Ms. MIKULSKI (D; Maryland):\\s(.*)&quot;,&quot;^Mr. SARBANES (D; Maryland):\\s(.*)&quot;,&quot;^Mr. KENNEDY (D; Massachusetts):\\s(.*)&quot;,&quot;^Mr. KERRY (D; Massachusetts):\\s(.*)&quot;,&quot;^Ms. STABENOW (D; Michigan):\\s(.*)&quot;,&quot;^Mr. LEVIN (D; Michigan):\\s(.*)&quot;,&quot;^Mr. DAYTON (D; Minnesota):\\s(.*)&quot;,&quot;^Mr. COLEMAN (R; Minnesota):\\s(.*)&quot;,&quot;^Mr. COCHRAN (R; Mississippi):\\s(.*)&quot;,&quot;^Mr. COCHRAN (R; Mississippi):\\s(.*)&quot;,&quot;^Mr. TALENT (R; Missouri):\\s(.*)&quot;,&quot;^Mr. BOND (R; Missouri):\\s(.*)&quot;,&quot;^Mr. BAUCUS (D; Montana):\\s(.*)&quot;,&quot;^Mr. BURNS (R; Montana):\\s(.*)&quot;,&quot;^Mr. HAGEL (R; Nebraska):\\s(.*)&quot;,&quot;^Mr. ENSIGN (R; Nevada):\\s(.*)&quot;,&quot;^Mr. REID (D; Nevada):\\s(.*)&quot;,&quot;^Mr. GREGG (R; New Hampshire):\\s(.*)&quot;,&quot;^Mr. SUNUNU (R; New Hampshire):\\s(.*)&quot;,&quot;^Mr. CORZINE (D; New Jersey):\\s(.*)&quot;,&quot;^Mr. LAUTENBERG (D; New Jersey):\\s(.*)&quot;,&quot;^Mr. BINGAMAN (D; New Mexico):\\s(.*)&quot;,&quot;^Mr. DOMENICI (R; New Mexico):\\s(.*)&quot;,&quot;^Mrs. CLINTON (D; New York):\\s(.*)&quot;,&quot;^Mr. SCHUMER (D; New York):\\s(.*)&quot;,&quot;^Mr. SCHUMER (D; New York):\\s(.*)&quot;,&quot;^Mr. SCHUMER (D; New York):\\s(.*)&quot;,&quot;^Mr. DORGAN (D; North Dakota):\\s(.*)&quot;,&quot;^Mr. DEWINE (R; Ohio):\\s(.*)&quot;,&quot;^Mr. VOINOVICH (R; Ohio):\\s(.*)&quot;,&quot;^Mr. INHOFE (R; Oklahoma):\\s(.*)&quot;,&quot;^Mr. SMITH (R; Oregon):\\s(.*)&quot;,&quot;^Mr. WYDEN (D; Oregon):\\s(.*)&quot;,&quot;^Mr. SANTORUM (R; Pennsylvania):\\s(.*)&quot;,&quot;^Mr. SPECTER (R; Pennsylvania):\\s(.*)&quot;,&quot;^Mr. CHAFEE (R; Rhode Island):\\s(.*)&quot;,&quot;^Mr. REED (D; Rhode Island):\\s(.*)&quot;,&quot;^Mr. HOLLINGS (D; South Carolina):\\s(.*)&quot;,&quot;^Mr. DASCHLE (D; South Dakota):\\s(.*)&quot;,&quot;^Mr. FRIST (R; Tennessee):\\s(.*)&quot;,&quot;^Mr. ALEXANDER (R; Tennessee):\\s(.*)&quot;,&quot;^Mr. CORNYN (R; Texas):\\s(.*)&quot;,&quot;^Mrs. HUTCHISON (R; Texas):\\s(.*)&quot;,&quot;^Mr. BENNETT (R; Utah):\\s(.*)&quot;,&quot;^Mr. HATCH (R; Utah):\\s(.*)&quot;,&quot;^Mr. JEFFORDS (I; Vermont):\\s(.*)&quot;,&quot;^Mr. ALLEN (R; Virginia):\\s(.*)&quot;,&quot;^Mr. WARNER (R; Virginia):\\s(.*)&quot;,&quot;^Ms. CANTWELL (D; Washington):\\s(.*)&quot;,&quot;^Ms. CANTWELL (D; Washington):\\s(.*)&quot;,&quot;^Mr. BYRD (D; West Virginia):\\s(.*)&quot;,&quot;^Mr. ROCKEFELLER (D; West Virginia):\\s(.*)&quot;,&quot;^Mr. FEINGOLD (D; Wisconsin):\\s(.*)&quot;,&quot;^Mr. KOHL (D; Wisconsin):\\s(.*)&quot;,&quot;^Mr. ENZI (R; Wyoming):\\s(.*)&quot;,&quot;^Mr. THOMAS (R; Wyoming):\\s(.*)&quot;)
sen_names &lt;- gsub(&quot;\\(.*&quot;,&quot;&quot;,sen_names)
sen_names &lt;- trimws(sen_names)

# Add &quot;or&quot; operand | to sen_names
sen_names &lt;-  paste0(&quot;(&quot;, paste(sen_names, collapse = &quot;|&quot;), &quot;)&quot;)

# Create new filtered object using sen_names
speeches_only &lt;- aug_01_2003_speeches %&gt;%
  filter(str_detect(speeches, sen_names))

# Result
data.frame(substr(speeches_only[1:10,1], 1, 50))

                substr.speeches_only.1.10..1...1..50.
1  Mr. FRIST (R; Tennessee): Mr. President, this morn
2  Mr. FRIST (R; Tennessee): Mr. President, I ask una
3  Mr. FRIST (R; Tennessee): Mr. President, I take th
4  Mr. McCONNELL (R; Kentucky): Mr. President, I list
5  Mr. McCONNELL (R; Kentucky): Mr. President, on ano
6  Mr. BINGAMAN (D; New Mexico): Mr. President, let m
7  Mr. C NELSON (D; Florida): Mr. President, I ask un
8  Mr. C NELSON (D; Florida): Mr. President, we have 
9  Mr. C NELSON (D; Florida): Mr. President, I was lo
10 Mr. TALENT (R; Missouri): Mr. President, it is my 

In total, 134 speeches matching your criteria were found in the example dataset you provided.

答案2

得分: 1

The regexes for the sen_names were wrong by just a bit. You were almost there.

我们只需稍微修正sen_names的正则表达式即可。你已经接近成功。

We have to escape the parenthesis and literal dots, as these are special characters. I also see no reason for (.*) at the end of the patterns.

我们需要转义括号和点号,因为它们是特殊字符。我也看不出在模式的末尾需要(.*)的原因。

After correcting the regexes, we can paste all sen_names into a single character scalar, separated by the OR special character (|), then do the matching on this single char element.

在修正正则表达式之后,我们可以使用paste将所有sen_names连接成一个单一的字符标量,用OR特殊字符(|)分隔,然后在这个单一字符元素上进行匹配。

With the following code, I could filter a data.frame of just two rows. I believe this example with two sen_names is an adequate minimal reprex:

使用以下代码,我可以筛选出仅有两行的数据框。我认为这个包含两个sen_names的示例是一个足够简化的表现示例:

sen_names <- c("^Mr\\. McCONNELL \\(R; Kentucky\\):\\s","^Mr\\. JOHNSON \\(D; South Dakota\\):\\s")
consolidated_sen_names <- paste(sen_names, collapse = "|")

aug_01_2003_speeches %>%
    filter(str_detect(speeches, consolidated_sen_names)) %>%
    tibble()

# A tibble: 2 × 1
  speeches                                                                                                                             
1 Mr. McCONNELL (R; Kentucky): Mr. President, I listened with great interest to Senator Frist (R; Tennessee)'s comments about Bob Hope…
2 Mr. McCONNELL (R; Kentucky): Mr. President, on another subject, I commend Senator Frist (R; Tennessee), before he leaves the floor, …
英文:

The regexes for the sen_names were wrong by just a bit. You were almost there.

We have to escape the parenthesis and literal dots, as these are special characters. I also see no reason for (.*) at the end of the patterns.
After correcting the regexes, we can paste all sen_names into a single character scalar, separeted by the OR special character (|), then do the matching on this single char element.

With the following code, I could filter a data.frame of just two rows. I believe this example with two sen_names is an adequate minimal reprex:

sen_names &lt;- c(&quot;^Mr\\. McCONNELL \\(R; Kentucky\\):\\s&quot;,&quot;^Mr\\. JOHNSON \\(D; South Dakota\\):\\s&quot;)
consolidated_sen_names &lt;- paste(sen_names, collapse = &quot;|&quot;)

aug_01_2003_speeches |&gt; 
    filter(str_detect(speeches, consolidated_sen_names)) |&gt;
    tibble()

# A tibble: 2 &#215; 1
  speeches                                                                                                                             
  &lt;chr&gt;                                                                                                                                
1 Mr. McCONNELL (R; Kentucky): Mr. President, I listened with great interest to Senator Frist (R; Tennessee)&#39;s comments about Bob Hope…
2 Mr. McCONNELL (R; Kentucky): Mr. President, on another subject, I commend Senator Frist (R; Tennessee), before he leaves the floor, …

</details>



huangapple
  • 本文由 发表于 2023年6月1日 05:09:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377320.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定