使用时间条件而不是季度/月份来改变情感指标(使用dplyr)。

huangapple go评论76阅读模式
英文:

Mutating a sentiment indicator using time conditions, rather than quarter/month (dplyr)

问题

我有一个Reddit数据集,其中每行代表一篇Reddit帖子,我有每个Reddit帖子的情感分数,还有一个变量来捕捉由相同用户名编写的所有帖子的平均情感。

我试图创建一个与最低工资政策时间线相关的情感指标,我想根据三个时期对每个用户名进行情感分类:

1- 政策公告之前,假设是在“2021-03-01”之前。
2- 政策宣布之后但在实施之前,在“2021-03-01”之后但在“2021-09-01”之前。
3- 政策实施之后,在“2021-09-01”之后。

我已经能够按月或季度计算每个用户名的情感,如下所示,但我想根据上述特定政策时间线创建每个用户名的情感,我不确定如何做到这一点。

上传包

library(tidyverse)
library(lubridate)
library(zoo)

打印具有特定列的数据示例

dput(df[1:5,c(3,4,21, 22, 23)])

输出:

structure(list(date = structure(c(15149, 15150, 15150, 15150, 
15150), class = "Date"), username = c("ax", "aa", 
"cartman", "abc", "aff"
), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", 
"2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), 
avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(username = c("ax", 
"cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))

创建季度/年份变量

sentiment_df <- sentiment_df %>% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)))

根据用户名计算情感得分的平均值,基于他们有很多观察/帖子:

sentiment_df <- df %>% group_by(username, quarter_yr) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

按用户名的季度情感:

dput(sentiment_df[1:2,c(1,8)])

输出

structure(list(username = c("cartman","aa"
), `2014 Q2` = c(NA_real_, NA_real_)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), groups = structure(list(
    username = c("cartman","aa"), .rows = structure(list(
        1L, 2L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
英文:

I have a reddit dataset where each row represents a single reddit post and I have a sentiment score for each reddit post by a given username. I also have a variable capturing the average sentiment for all posts written by the same username.

I am trying to create a sentiment indicator relevant to the timeline of a minimum wage policy, where I would like to categorize sentiment per username based on three periods:

1- Before the policy's announcement, let's say it's on "2021-03-01"
2- After the policy announcement yet before implementation, so after "2021-03-01" but before "2021-09-01"
3- after the policy's implementation, on "2021-09-01"

I have been able to compute sentiment for each username by month or quarter, as I show below but I would like to create sentiment per username based on the specific policy timeline above, and I am not sure how to do that.

Upload packages

library(tidyverse)
library(lubridate)
library(zoo)

Print data example with specific columns

dput(df[1:5,c(3,4,21, 22, 23)])

output:

structure(list(date = structure(c(15149, 15150, 15150, 15150, 
15150), class = &quot;Date&quot;), username = c(&quot;ax&quot;, &quot;aa&quot;, 
&quot;cartman&quot;, &quot;abc&quot;, &quot;aff&quot;
), quarter_yr = c(&quot;2011 Q2&quot;, &quot;2011 Q2&quot;, &quot;2011 Q2&quot;, &quot;2011 Q2&quot;, 
&quot;2011 Q2&quot;), sentiment_score = c(&quot;0&quot;, &quot;-1&quot;, &quot;1&quot;, &quot;-1&quot;, &quot;-1&quot;), 
    avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
    1, -1, -1)), class = c(&quot;grouped_df&quot;, &quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -5L), groups = structure(list(username = c(&quot;ax&quot;, 
&quot;cartman&quot;, &quot;abc&quot;, &quot;aff&quot;), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
&quot;vctrs_vctr&quot;, &quot;list&quot;))), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -5L), .drop = TRUE))

create a quarter/year variable

sentiment_df &lt;- sentiment_df %&gt;% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)))

Compute an average sentiment score per username, based on the many observations/posts they have:

sentiment_df &lt;-
df %&gt;% group_by(username, quarter_yr) %&gt;% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

Quarterly sentiment by username:

dput(sentiment_df[1:2,c(1,8)])

output

structure(list(username = c(&quot;cartman&quot;,&quot;aa&quot;
), `2014 Q2` = c(NA_real_, NA_real_)), class = c(&quot;grouped_df&quot;, 
&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -2L), groups = structure(list(
    username = c(&quot;cartman&quot;,&quot;aa&quot;), .rows = structure(list(
        1L, 2L), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
    &quot;vctrs_vctr&quot;, &quot;list&quot;))), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -2L), .drop = TRUE))

答案1

得分: 1

sentiment_df <- sentiment_df %>%
mutate(date = ymd(date),
quarter_yr = paste(year(date), quarters(date)),
phase = case_when(date < ymd(20210301) ~ "1 公告前",
date < ymd(20210901) ~ "2 实施前",
TRUE ~ "3 实施后"))

sentiment_df <-
df %>%
group_by(username, phase) %&gt%
summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

英文:
sentiment_df &lt;- sentiment_df %&gt;% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         phase = case_when(date &lt; ymd(20210301) ~ &quot;1 Before announcement&quot;,
                           date &lt; ymd(20210901) ~ &quot;2 Before implementation&quot;,
                           TRUE ~ &quot;3 After implementation&quot;))

sentiment_df &lt;-
df %&gt;% 
  group_by(username, phase) %&gt;% 
  summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

答案2

得分: 1

代码部分不需要翻译,以下是已翻译的内容:

It seems like you simply create a new variable using mutate() and case_when(), and then group by the new variable. Here was my attempt. Is this what you are after?

library(dplyr)
library(lubridate)
library(zoo)
sentiment_df <- structure(list(date = structure(c(15149, 15150, 15150, 15150, 15150), class = "Date"), username = c("ax", "aa", "cartman", "abc", "aff"), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), avg_sentiment = c(0.0666666666666667, -0.777777777777778, 1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row names = c(NA, -5L), groups = structure(list(username = c("ax", "cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"), row names = c(NA, -5L), .drop = TRUE))
sentiment_df <- sentiment_df %>% mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         implementation_period = case_when(date < as.Date("2021-03-01") ~ "Before",
                            date >= as.Date("2021-03-01") & date < as.Date("2021-09-01") ~ "Pre_Implementation",
                            TRUE ~ "After"))

sentiment_df <-
  sentiment_df %>% group_by(username, implementation_period) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

One quick note, in the data you provide there are only "Before" dates. But I think it should work on the whole dataset.

英文:

It seems like you simply create a new variable using mutate() and case_when(), and then group by the new variable. Here was my attempt. Is this what you are after?

library(dplyr)
library(lubridate)
library(zoo)
sentiment_df&lt;-structure(list(date = structure(c(15149, 15150, 15150, 15150, 
                                  15150), class = &quot;Date&quot;), username = c(&quot;ax&quot;, &quot;aa&quot;, 
                                                                        &quot;cartman&quot;, &quot;abc&quot;, &quot;aff&quot;
                                  ), quarter_yr = c(&quot;2011 Q2&quot;, &quot;2011 Q2&quot;, &quot;2011 Q2&quot;, &quot;2011 Q2&quot;, 
                                                    &quot;2011 Q2&quot;), sentiment_score = c(&quot;0&quot;, &quot;-1&quot;, &quot;1&quot;, &quot;-1&quot;, &quot;-1&quot;), 
               avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
                                 1, -1, -1)), class = c(&quot;grouped_df&quot;, &quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
                                 ), row.names = c(NA, -5L), groups = structure(list(username = c(&quot;ax&quot;, 
                                                                                                 &quot;cartman&quot;, &quot;abc&quot;, &quot;aff&quot;), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
                                                                                                                                                                                                     &quot;vctrs_vctr&quot;, &quot;list&quot;))), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
                                                                                                                                                                                                     ), row.names = c(NA, -5L), .drop = TRUE))
sentiment_df &lt;- sentiment_df %&gt;%  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         implementation_period = case_when(date &lt; as.Date(&quot;2021-03-01&quot;) ~ &quot;Before&quot;,
                            date &gt;= as.Date(&quot;2021-03-01&quot;) &amp; date &lt; as.Date(&quot;2021-09-01&quot;) ~ &quot;Pre_Implementation&quot;,
                            TRUE ~ &quot;After&quot;))

sentiment_df &lt;-
  sentiment_df %&gt;% group_by(username, implementation_period) %&gt;% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

One quick note, in the data you provide there are only "Before" dates. But I think it should work on the whole dataset.

huangapple
  • 本文由 发表于 2023年3月7日 03:34:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655086.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定