英文:
Replace specific text values in a row of a dataframe based on a true condition on the row before
问题
以下是代码的翻译部分:
# 第一次尝试的函数
r_sen_from_states <- function(x){
if(x == "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):"){
str_replace(x, "the senator from Alabama", "Senator Shelby (R; Alabama)")
} else if (x == "the senator from Alabama" & lag(x)=="^Mr. SESSIONS (R; Alabama)"){
str_replace(x, "the senator from Alabama", "Senator Sessions (R; Alabama)")
}
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> error, condition has length 1 and only first element will be used
# 第二次尝试的函数
r_sen_from_states <- function(x){
ifelse(x %in% "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):",
str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches)
## 输出 -> the dataframe but without replacing any values
# 第三次尝试的函数
r_sen_from_states <- function(x){
for (i in 1:nrow(x)) {
ifelse(x == "the senator from Alabama" & lag(x, n = 1L) == "^Mr\\. SHELBY \\(R; Alabama\\):",
str_replace(x, "the senator from Alabama", "senator Shelby"), x)
}
}
test_df_ran <- r_sen_from_states(test_df)
## 输出 -> "NULL"
希望这能帮助你解决问题。如果有其他需要,请随时提问。
英文:
I am doing a text analysis from the congressional record, specifically when senators are speaking about each other. There are many instances where one senator refers to another who just finished speaking without naming them (ie: my colleague, my friend, etc). I am trying to replace those instances with their name.
The speeches are split into rows. The senator who is speaking is listed by name at the start of the row.
I tried three different functions. First attempt was an if elseif:
#function 1 (had error)
r_sen_from_states <- function(x){
if(x == "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):"){
str_replace(x, "the senator from Alabama", "Senator Shelby \\(R; Alabama\\)")
} else if (x == "the senator from Alabama" & lag(x)=="^Mr.\\. SESSIONS \\(R; Alabama\\)"){
str_replace(x, "the senator from Alabama", "Senator Sessions \\(R; Alabama\\)")
}
}
test_df_ran <- r_sen_from_states(test_df)
##output -> error, condition has length 1> and only first element will be used
Second attempt was ifelse:
#function 2 (does not replace values with new values but no error because ifelse vectorization)
r_sen_from_states <- function(x){
ifelse(x %in% "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):",
str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches)
##output -> the dataframe but without replacing any values
Third attempt was for loop ifelse:
#function 3 (produced "NULL")
r_sen_from_states <- function(x){
for (i in 1:nrow(x)) {
ifelse(x == "the senator from Alabama" & lag(x, n = 1L) == "^Mr\\. SHELBY \\(R; Alabama\\):",
str_replace(x, "the senator from Alabama", "senator Shelby"), x)
}
}
test_df_ran <- r_sen_from_states(test_df)
##output -> "NULL"
If I can get the ifelse() statement to apply and change the declared values, then I will construct the r_sen_states_from function using nested ifelse() statements for each state and senator possibility.
e.g., ifelse(x=="the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):", str_replace(x,"the senator from Alabama","senator Shelby"), ifelse(x=="the senator from Alabama" & lag(x)==^Mr.\\. SESSIONS \\(R; Alabama\\):", str_replace(x, "the senator from Alabama", "senator Sessions"),...[etc. for each state and senator pairing])
Here's some sample data for replication/debugging purposes.
#environment data below
test_col <- c("Mr. SHELBY (R; Alabama): I acknowledge this is a test.",
"Mrs. MURRAY (D; Washington): I say to my friend, the senator from Alabama, that they are wrong.",
"Mr. SHELBY (R; Alabama): I do not agree with my colleague.",
"Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.",
"Mr. SHELBY (R; Alabama): I thank the majority leader for their support.",
"Mr. SESSIONS (R; Alabama): I am proud of my junior, the senator from Alabama.",
"Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.")
test_df <- data.frame(test_col)
colnames(test_df) <- c("speeches")
答案1
得分: 0
以下是您要求的翻译部分:
"Mr. SHELBY (R; Alabama): I acknowledge this is a test."
"Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong."
"Mr. SHELBY (R; Alabama): I do not agree with my colleague."
"Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray."
"Mr. SHELBY (R; Alabama): I thank the majority leader for their support."
"Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby."
"Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things."
请注意,这是您提供的代码的输出结果的翻译部分。如果您需要任何其他帮助,请随时告诉我。
英文:
The code x == "the senator from Alabama"
will only be true if x
contains that text and nothing else - instead, you should use str_detect
. I swapped that in for your second function (haven't tried the others) and it worked great:
r_sen_from_states <- function(x){
ifelse(str_detect(x, "the senator from Alabama") & str_detect(lag(x), "^Mr\\. SHELBY \\(R; Alabama\\):"),
str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches) %>% print()
[1] "Mr. SHELBY (R; Alabama): I acknowledge this is a test."
[2] "Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong."
[3] "Mr. SHELBY (R; Alabama): I do not agree with my colleague."
[4] "Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray."
[5] "Mr. SHELBY (R; Alabama): I thank the majority leader for their support."
[6] "Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby."
[7] "Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things."
(BTW I've never discovered lag
before, thank you for bringing it to my attention!)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论