2023年6月4日 23:01:33go评论83阅读模式

英文:

Replace specific text values in a row of a dataframe based on a true condition on the row before

问题

以下是代码的翻译部分：

# 第一次尝试的函数
r_sen_from_states <- function(x){
  if(x == "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):"){
    str_replace(x, "the senator from Alabama", "Senator Shelby (R; Alabama)")
  } else if (x == "the senator from Alabama" & lag(x)=="^Mr. SESSIONS (R; Alabama)"){
    str_replace(x, "the senator from Alabama", "Senator Sessions (R; Alabama)")
  }
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> error, condition has length 1 and only first element will be used

# 第二次尝试的函数
r_sen_from_states <- function(x){
  ifelse(x %in% "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):",
         str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches)
## 输出 -> the dataframe but without replacing any values

# 第三次尝试的函数
r_sen_from_states <- function(x){
  for (i in 1:nrow(x)) {
    ifelse(x == "the senator from Alabama" & lag(x, n = 1L) == "^Mr\\. SHELBY \\(R; Alabama\\):",
           str_replace(x, "the senator from Alabama", "senator Shelby"), x)
  }
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> "NULL"

希望这能帮助你解决问题。如果有其他需要，请随时提问。

英文:

I am doing a text analysis from the congressional record, specifically when senators are speaking about each other. There are many instances where one senator refers to another who just finished speaking without naming them (ie: my colleague, my friend, etc). I am trying to replace those instances with their name.

The speeches are split into rows. The senator who is speaking is listed by name at the start of the row.

I tried three different functions. First attempt was an if elseif:

#function 1 (had error)
r_sen_from_states &lt;- function(x){
  if(x == &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;){
    str_replace(x, &quot;the senator from Alabama&quot;, &quot;Senator Shelby \\(R; Alabama\\)&quot;)
  } else if (x == &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr.\\. SESSIONS \\(R; Alabama\\)&quot;){
    str_replace(x, &quot;the senator from Alabama&quot;, &quot;Senator Sessions \\(R; Alabama\\)&quot;)
  }
}
test_df_ran &lt;- r_sen_from_states(test_df)
##output -&gt; error, condition has length 1&gt; and only first element will be used

Second attempt was ifelse:

#function 2 (does not replace values with new values but no error because ifelse vectorization)
r_sen_from_states &lt;- function(x){
  ifelse(x %in% &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;,
         str_replace(x, &quot;the senator from Alabama&quot;,&quot;senator Shelby&quot;), x)}
test_df_ran &lt;- r_sen_from_states(test_df$speeches)
##output -&gt; the dataframe but without replacing any values

Third attempt was for loop ifelse:

#function 3 (produced &quot;NULL&quot;)
r_sen_from_states &lt;- function(x){
  for (i in 1:nrow(x)) {
    ifelse(x == &quot;the senator from Alabama&quot; &amp; lag(x, n = 1L) == &quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;,
           str_replace(x, &quot;the senator from Alabama&quot;, &quot;senator Shelby&quot;), x)
  }
}
test_df_ran &lt;- r_sen_from_states(test_df)
##output -&gt; &quot;NULL&quot;

If I can get the ifelse() statement to apply and change the declared values, then I will construct the r_sen_states_from function using nested ifelse() statements for each state and senator possibility.

e.g., ifelse(x=="the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):", str_replace(x,"the senator from Alabama","senator Shelby"), ifelse(x=="the senator from Alabama" & lag(x)==^Mr.\\. SESSIONS \\(R; Alabama\\):", str_replace(x, "the senator from Alabama", "senator Sessions"),...[etc. for each state and senator pairing])

Here's some sample data for replication/debugging purposes.

#environment data below
test_col &lt;- c(&quot;Mr. SHELBY (R; Alabama): I acknowledge this is a test.&quot;,
              &quot;Mrs. MURRAY (D; Washington): I say to my friend, the senator from Alabama, that they are wrong.&quot;,
              &quot;Mr. SHELBY (R; Alabama): I do not agree with my colleague.&quot;,
              &quot;Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.&quot;,
              &quot;Mr. SHELBY (R; Alabama): I thank the majority leader for their support.&quot;,
              &quot;Mr. SESSIONS (R; Alabama): I am proud of my junior, the senator from Alabama.&quot;,
              &quot;Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.&quot;)
test_df &lt;- data.frame(test_col)
colnames(test_df) &lt;- c(&quot;speeches&quot;)

答案1

得分: 0

以下是您要求的翻译部分：

"Mr. SHELBY (R; Alabama): I acknowledge this is a test."
"Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong."
"Mr. SHELBY (R; Alabama): I do not agree with my colleague."
"Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray."
"Mr. SHELBY (R; Alabama): I thank the majority leader for their support."
"Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby."
"Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things."

请注意，这是您提供的代码的输出结果的翻译部分。如果您需要任何其他帮助，请随时告诉我。

英文:

The code x == "the senator from Alabama" will only be true if x contains that text and nothing else - instead, you should use str_detect. I swapped that in for your second function (haven't tried the others) and it worked great:

r_sen_from_states &lt;- function(x){
  ifelse(str_detect(x, &quot;the senator from Alabama&quot;) &amp; str_detect(lag(x), &quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;),
         str_replace(x, &quot;the senator from Alabama&quot;,&quot;senator Shelby&quot;), x)}
test_df_ran &lt;- r_sen_from_states(test_df$speeches) %&gt;% print()

[1] &quot;Mr. SHELBY (R; Alabama): I acknowledge this is a test.&quot;                                   
[2] &quot;Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong.&quot;    
[3] &quot;Mr. SHELBY (R; Alabama): I do not agree with my colleague.&quot;                               
[4] &quot;Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.&quot;           
[5] &quot;Mr. SHELBY (R; Alabama): I thank the majority leader for their support.&quot;                  
[6] &quot;Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby.&quot;                      
[7] &quot;Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.&quot;

(BTW I've never discovered lag before, thank you for bringing it to my attention!)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个数据框的一行中，根据前一行的条件将特定文本值替换为：

问题

答案1

在使用日期进行筛选时，“%in%”和“==”的行为是不同的。

查看正在执行的 R 脚本的已删除源代码

R: 编写基于图的函数

Sapply函数在R中：NA由强制转换引入，但我只有数值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论