在一个数据框的一行中,根据前一行的条件将特定文本值替换为:

huangapple go评论68阅读模式
英文:

Replace specific text values in a row of a dataframe based on a true condition on the row before

问题

以下是代码的翻译部分:

# 第一次尝试的函数
r_sen_from_states <- function(x){
  if(x == "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):"){
    str_replace(x, "the senator from Alabama", "Senator Shelby (R; Alabama)")
  } else if (x == "the senator from Alabama" & lag(x)=="^Mr. SESSIONS (R; Alabama)"){
    str_replace(x, "the senator from Alabama", "Senator Sessions (R; Alabama)")
  }
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> error, condition has length 1 and only first element will be used
# 第二次尝试的函数
r_sen_from_states <- function(x){
  ifelse(x %in% "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):",
         str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches)
## 输出 -> the dataframe but without replacing any values
# 第三次尝试的函数
r_sen_from_states <- function(x){
  for (i in 1:nrow(x)) {
    ifelse(x == "the senator from Alabama" & lag(x, n = 1L) == "^Mr\\. SHELBY \\(R; Alabama\\):",
           str_replace(x, "the senator from Alabama", "senator Shelby"), x)
  }
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> "NULL"

希望这能帮助你解决问题。如果有其他需要,请随时提问。

英文:

I am doing a text analysis from the congressional record, specifically when senators are speaking about each other. There are many instances where one senator refers to another who just finished speaking without naming them (ie: my colleague, my friend, etc). I am trying to replace those instances with their name.

The speeches are split into rows. The senator who is speaking is listed by name at the start of the row.

I tried three different functions. First attempt was an if elseif:

#function 1 (had error)
r_sen_from_states &lt;- function(x){
  if(x == &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;){
    str_replace(x, &quot;the senator from Alabama&quot;, &quot;Senator Shelby \\(R; Alabama\\)&quot;)
  } else if (x == &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr.\\. SESSIONS \\(R; Alabama\\)&quot;){
    str_replace(x, &quot;the senator from Alabama&quot;, &quot;Senator Sessions \\(R; Alabama\\)&quot;)
  }
}
test_df_ran &lt;- r_sen_from_states(test_df)
##output -&gt; error, condition has length 1&gt; and only first element will be used

Second attempt was ifelse:

#function 2 (does not replace values with new values but no error because ifelse vectorization)
r_sen_from_states &lt;- function(x){
  ifelse(x %in% &quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;,
         str_replace(x, &quot;the senator from Alabama&quot;,&quot;senator Shelby&quot;), x)}
test_df_ran &lt;- r_sen_from_states(test_df$speeches)
##output -&gt; the dataframe but without replacing any values

Third attempt was for loop ifelse:

#function 3 (produced &quot;NULL&quot;)
r_sen_from_states &lt;- function(x){
  for (i in 1:nrow(x)) {
    ifelse(x == &quot;the senator from Alabama&quot; &amp; lag(x, n = 1L) == &quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;,
           str_replace(x, &quot;the senator from Alabama&quot;, &quot;senator Shelby&quot;), x)
  }
}
test_df_ran &lt;- r_sen_from_states(test_df)
##output -&gt; &quot;NULL&quot;

If I can get the ifelse() statement to apply and change the declared values, then I will construct the r_sen_states_from function using nested ifelse() statements for each state and senator possibility.

e.g., ifelse(x==&quot;the senator from Alabama&quot; &amp; lag(x)==&quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;, str_replace(x,&quot;the senator from Alabama&quot;,&quot;senator Shelby&quot;), ifelse(x==&quot;the senator from Alabama&quot; &amp; lag(x)==^Mr.\\. SESSIONS \\(R; Alabama\\):&quot;, str_replace(x, &quot;the senator from Alabama&quot;, &quot;senator Sessions&quot;),...[etc. for each state and senator pairing])

Here's some sample data for replication/debugging purposes.

#environment data below
test_col &lt;- c(&quot;Mr. SHELBY (R; Alabama): I acknowledge this is a test.&quot;,
              &quot;Mrs. MURRAY (D; Washington): I say to my friend, the senator from Alabama, that they are wrong.&quot;,
              &quot;Mr. SHELBY (R; Alabama): I do not agree with my colleague.&quot;,
              &quot;Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.&quot;,
              &quot;Mr. SHELBY (R; Alabama): I thank the majority leader for their support.&quot;,
              &quot;Mr. SESSIONS (R; Alabama): I am proud of my junior, the senator from Alabama.&quot;,
              &quot;Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.&quot;)
test_df &lt;- data.frame(test_col)
colnames(test_df) &lt;- c(&quot;speeches&quot;)

答案1

得分: 0

以下是您要求的翻译部分:

"Mr. SHELBY (R; Alabama): I acknowledge this is a test."
"Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong."
"Mr. SHELBY (R; Alabama): I do not agree with my colleague."
"Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray."
"Mr. SHELBY (R; Alabama): I thank the majority leader for their support."
"Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby."
"Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things."

请注意,这是您提供的代码的输出结果的翻译部分。如果您需要任何其他帮助,请随时告诉我。

英文:

The code x == &quot;the senator from Alabama&quot; will only be true if x contains that text and nothing else - instead, you should use str_detect. I swapped that in for your second function (haven't tried the others) and it worked great:

r_sen_from_states &lt;- function(x){
  ifelse(str_detect(x, &quot;the senator from Alabama&quot;) &amp; str_detect(lag(x), &quot;^Mr\\. SHELBY \\(R; Alabama\\):&quot;),
         str_replace(x, &quot;the senator from Alabama&quot;,&quot;senator Shelby&quot;), x)}
test_df_ran &lt;- r_sen_from_states(test_df$speeches) %&gt;% print()

[1] &quot;Mr. SHELBY (R; Alabama): I acknowledge this is a test.&quot;                                   
[2] &quot;Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong.&quot;    
[3] &quot;Mr. SHELBY (R; Alabama): I do not agree with my colleague.&quot;                               
[4] &quot;Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.&quot;           
[5] &quot;Mr. SHELBY (R; Alabama): I thank the majority leader for their support.&quot;                  
[6] &quot;Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby.&quot;                      
[7] &quot;Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.&quot;

(BTW I've never discovered lag before, thank you for bringing it to my attention!)

huangapple
  • 本文由 发表于 2023年6月4日 23:01:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401017.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定