英文:
Visualizing two categorical variables & filtering data with specific conditions
问题
我有一份劳动力调查,尝试按职业和性别可视化平均工资如下:
# 按性别和职称计算薪水
ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) +
geom_bar(position="dodge", stat="identity")
输出:
[![enter image description here][1]][1]
然而,由于对于某些职位我缺乏两个分类变量的薪资数据,我希望限制图表,仅比较我对两者都有观察的工资:1) 职位和2) gender_preferences。
我尝试了这个,但收到一个错误,"Error in count(gender_preferences) :
object 'gender_preferences' not found"。
理想情况下,我希望比较存在两个变量数据的情况下的性别和工作的收入。
graph <-
job_posts %>% count(jobtitle) %>% filter(n>1) &
count(gender_preferences) %>% filter(n>1)
这是一个数据示例:
# 打印具有特定列的数据示例
dput(job_posts[1:8,c(3,23,25)])
输出:
structure(list(jobtitle = c("باريستا و كاشير", "Hotel Manager",
"HR Officer", "school supervisor", "كاشير /خدمة عملاء",
"Store Leader", "محاسبة رواتب", "Senior Accountant"
), gender_preferences = c("indifferent", "indifferent", "indifferent",
"indifferent", "indifferent", "indifferent", "female", "indifferent"
), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100,
NA)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
```
我已经查阅了[这里][2]的指南,但找不到类似的情况,可以通过两个分类变量的条件进行筛选。
[1]: https://i.stack.imgur.com/An1Rw.png
[2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2
<details>
<summary>英文:</summary>
I have a labor force survey and I am trying to visualize average wages by occupation and gender as follows:
```{r}
# Compute salary by gender and jobtitle
ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) +
geom_bar(position="dodge", stat="identity")
```
output:
[![enter image description here][1]][1]
However, given that for certain jobs I lack the wage data across both categorical variables, I would like to restrict my graph to Only compare wages where I have observations on both: 1) jobs and 2) gender_preferences.
I tried this but I receive an error, "Error in count(gender_preferences) :
object 'gender_preferences' not found"
Ideally, I would like to compare income by gender and job where data exists for both variables.
```{r}
graph <-
job_posts %>% count(jobtitle) %>% filter(n>1) &
count(gender_preferences) %>% filter(n>1)
```
Here is a data example:
```{r}
# Print data example with specific columns
dput(job_posts[1:8,c(3,23,25)])
```
output:
```
structure(list(jobtitle = c("باريستا و كاشير", "Hotel Manager",
"HR Officer", "school supervisor", "كاشير /خدمة عملاء",
"Store Leader", "محاسبة رواتب", "Senior Accountant"
), gender_preferences = c("indifferent", "indifferent", "indifferent",
"indifferent", "indifferent", "indifferent", "female", "indifferent"
), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100,
NA)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
```
I have reviewed the guidance [here][2] but can't find a similar case where we filter by conditions for two categorical variables.
[1]: https://i.stack.imgur.com/An1Rw.png
[2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2
</details>
# 答案1
**得分**: 1
我已按照Spring先生提出的解决方案实施如下,并且它完美运行:
```
图形 <- job_posts %>%
按jobtitle分组 %>%
过滤(n_distinct(gender_preferences) > 1) %>%
解组()
```
<details>
<summary>英文:</summary>
> I have implemented the solution proposed by Mr. Spring as follows, and it worked perfectly:
> ```
> graph <- job_posts %>%
> group_by(jobtitle) %>%
> filter(n_distinct(gender_preferences) > 1) %>%
> ungroup()
> ```
\- OP
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论