可视化两个分类变量并按特定条件筛选数据

huangapple go评论82阅读模式
英文:

Visualizing two categorical variables & filtering data with specific conditions

问题

我有一份劳动力调查,尝试按职业和性别可视化平均工资如下:

  1. # 按性别和职称计算薪水
  2. ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) +
  3. geom_bar(position="dodge", stat="identity")

输出:
[![enter image description here][1]][1]

然而,由于对于某些职位我缺乏两个分类变量的薪资数据,我希望限制图表,仅比较我对两者都有观察的工资:1) 职位和2) gender_preferences。
我尝试了这个,但收到一个错误,"Error in count(gender_preferences) :
object 'gender_preferences' not found"。
理想情况下,我希望比较存在两个变量数据的情况下的性别和工作的收入。

  1. graph <-
  2. job_posts %>% count(jobtitle) %>% filter(n>1) &
  3. count(gender_preferences) %>% filter(n>1)

这是一个数据示例:

  1. # 打印具有特定列的数据示例
  2. dput(job_posts[1:8,c(3,23,25)])

输出:

  1. structure(list(jobtitle = c("باريستا و كاشير", "Hotel Manager",
  2. "HR Officer", "school supervisor", "كاشير /خدمة عملاء",
  3. "Store Leader", "محاسبة رواتب", "Senior Accountant"
  4. ), gender_preferences = c("indifferent", "indifferent", "indifferent",
  5. "indifferent", "indifferent", "indifferent", "female", "indifferent"
  6. ), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100,
  7. NA)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
  8. ))
  9. ```
  10. 我已经查阅了[这里][2]的指南,但找不到类似的情况,可以通过两个分类变量的条件进行筛选。
  11. [1]: https://i.stack.imgur.com/An1Rw.png
  12. [2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2
  13. <details>
  14. <summary>英文:</summary>
  15. I have a labor force survey and I am trying to visualize average wages by occupation and gender as follows:
  16. ```{r}
  17. # Compute salary by gender and jobtitle
  18. ggplot(job_posts, aes(fill=gender_preferences, y=monthly_income, x=jobtitle)) +
  19. geom_bar(position=&quot;dodge&quot;, stat=&quot;identity&quot;)
  20. ```
  21. output:
  22. [![enter image description here][1]][1]
  23. However, given that for certain jobs I lack the wage data across both categorical variables, I would like to restrict my graph to Only compare wages where I have observations on both: 1) jobs and 2) gender_preferences.
  24. I tried this but I receive an error, &quot;Error in count(gender_preferences) :
  25. object &#39;gender_preferences&#39; not found&quot;
  26. Ideally, I would like to compare income by gender and job where data exists for both variables.
  27. ```{r}
  28. graph &lt;-
  29. job_posts %&gt;% count(jobtitle) %&gt;% filter(n&gt;1) &amp;
  30. count(gender_preferences) %&gt;% filter(n&gt;1)
  31. ```
  32. Here is a data example:
  33. ```{r}
  34. # Print data example with specific columns
  35. dput(job_posts[1:8,c(3,23,25)])
  36. ```
  37. output:
  38. ```
  39. structure(list(jobtitle = c(&quot;باريستا و كاشير&quot;, &quot;Hotel Manager&quot;,
  40. &quot;HR Officer&quot;, &quot;school supervisor&quot;, &quot;كاشير /خدمة عملاء&quot;,
  41. &quot;Store Leader&quot;, &quot;محاسبة رواتب&quot;, &quot;Senior Accountant&quot;
  42. ), gender_preferences = c(&quot;indifferent&quot;, &quot;indifferent&quot;, &quot;indifferent&quot;,
  43. &quot;indifferent&quot;, &quot;indifferent&quot;, &quot;indifferent&quot;, &quot;female&quot;, &quot;indifferent&quot;
  44. ), monthly_income = c(4000, 4000, 5000, 12000, 5000, 4500, 100,
  45. NA)), row.names = c(NA, -8L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
  46. ))
  47. ```
  48. I have reviewed the guidance [here][2] but can&#39;t find a similar case where we filter by conditions for two categorical variables.
  49. [1]: https://i.stack.imgur.com/An1Rw.png
  50. [2]: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2
  51. </details>
  52. # 答案1
  53. **得分**: 1
  54. 我已按照Spring先生提出的解决方案实施如下,并且它完美运行:
  55. ```
  56. 图形 <- job_posts %>%
  57. jobtitle分组 %>%
  58. 过滤(n_distinct(gender_preferences) > 1) %>%
  59. 解组()
  60. ```
  61. <details>
  62. <summary>英文:</summary>
  63. &gt; I have implemented the solution proposed by Mr. Spring as follows, and it worked perfectly:
  64. &gt; ```
  65. &gt; graph &lt;- job_posts %&gt;%
  66. &gt; group_by(jobtitle) %&gt;%
  67. &gt; filter(n_distinct(gender_preferences) &gt; 1) %&gt;%
  68. &gt; ungroup()
  69. &gt; ```
  70. \- OP
  71. </details>

huangapple
  • 本文由 发表于 2023年3月21日 00:24:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定