可视化两个分类变量并按特定条件筛选数据

huangapple go评论65阅读模式
英文:

Visualizing two categorical variables & filtering data with specific conditions

问题

我有一份劳动力调查,尝试按职业和性别可视化平均工资如下:

# 按性别和职称计算薪水
ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) + 
    geom_bar(position="dodge", stat="identity")

输出:
[![enter image description here][1]][1]

然而,由于对于某些职位我缺乏两个分类变量的薪资数据,我希望限制图表,仅比较我对两者都有观察的工资:1) 职位和2) gender_preferences。
我尝试了这个,但收到一个错误,"Error in count(gender_preferences) :
object 'gender_preferences' not found"。
理想情况下,我希望比较存在两个变量数据的情况下的性别和工作的收入。

graph <-
job_posts %>% count(jobtitle)   %>% filter(n>1) &
  count(gender_preferences) %>% filter(n>1)

这是一个数据示例:

# 打印具有特定列的数据示例
dput(job_posts[1:8,c(3,23,25)])

输出:

structure(list(jobtitle = c("باريستا و كاشير", "Hotel Manager", 
"HR Officer", "school supervisor", "كاشير /خدمة عملاء", 
"Store Leader", "محاسبة رواتب", "Senior Accountant"
), gender_preferences = c("indifferent", "indifferent", "indifferent", 
"indifferent", "indifferent", "indifferent", "female", "indifferent"
), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100, 
NA)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
```

我已经查阅了[这里][2]的指南,但找不到类似的情况,可以通过两个分类变量的条件进行筛选。

  [1]: https://i.stack.imgur.com/An1Rw.png
  [2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2

<details>
<summary>英文:</summary>

I have a labor force survey and I am trying to visualize average wages by occupation and gender as follows:

```{r}
# Compute salary by gender and jobtitle
ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) + 
    geom_bar(position=&quot;dodge&quot;, stat=&quot;identity&quot;)
```
output:
[![enter image description here][1]][1]

However, given that for certain jobs I lack the wage data across both categorical variables, I would like to restrict my graph to Only compare wages where I have observations on both: 1) jobs and 2) gender_preferences.
I tried this but I receive an error, &quot;Error in count(gender_preferences) : 
object &#39;gender_preferences&#39; not found&quot;
Ideally, I would like to compare income by gender and job where data exists for both variables. 
```{r}
graph &lt;-
job_posts %&gt;% count(jobtitle)   %&gt;% filter(n&gt;1) &amp;
  count(gender_preferences) %&gt;%  filter(n&gt;1)
```

Here is a data example:
```{r}
# Print data example with specific columns
dput(job_posts[1:8,c(3,23,25)])
```
output:
```
structure(list(jobtitle = c(&quot;باريستا و كاشير&quot;, &quot;Hotel Manager&quot;, 
&quot;HR Officer&quot;, &quot;school supervisor&quot;, &quot;كاشير /خدمة عملاء&quot;, 
&quot;Store Leader&quot;, &quot;محاسبة رواتب&quot;, &quot;Senior Accountant&quot;
), gender_preferences = c(&quot;indifferent&quot;, &quot;indifferent&quot;, &quot;indifferent&quot;, 
&quot;indifferent&quot;, &quot;indifferent&quot;, &quot;indifferent&quot;, &quot;female&quot;, &quot;indifferent&quot;
), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100, 
NA)), row.names = c(NA, -8L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
))
```

I have reviewed the guidance [here][2] but can&#39;t find a similar case where we filter by conditions for two categorical variables.

  [1]: https://i.stack.imgur.com/An1Rw.png
  [2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2

</details>


# 答案1
**得分**: 1

我已按照Spring先生提出的解决方案实施如下,并且它完美运行:
```
图形 <- job_posts %>%
            按jobtitle分组 %>%
            过滤(n_distinct(gender_preferences) > 1) %>%
            解组()
```


<details>
<summary>英文:</summary>

&gt; I have implemented the solution proposed by Mr. Spring as follows, and it worked perfectly:
&gt; ```
&gt; graph &lt;- job_posts %&gt;% 
&gt;            group_by(jobtitle) %&gt;% 
&gt;            filter(n_distinct(gender_preferences) &gt; 1) %&gt;% 
&gt;            ungroup()
&gt; ```
\- OP

</details>



huangapple
  • 本文由 发表于 2023年3月21日 00:24:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定