如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

huangapple go评论71阅读模式
英文:

How to use geom+line and categorical data from 6 different lists(csvs)

问题

我从6个不同的列表/CSV文件开始,每个都包含一个字符列。此列显示了人房贷款公司(HOLC)的人口普查分组的社区等级。因此,列看起来像下面显示的样子。我是R studio的新手,我想知道第一步是否是将这些列表合并。另一个选择是为每个列表添加一个新的二进制列,如果它不是NA,则将其标识为1,如果是NA,则为0。然后,每个列表可以压缩为类别A、B、C、D和NA,并且可以对新的二进制列进行求和。

理想情况下,我有兴趣使用ggplot,但我也愿意尝试其他选项。谢谢你的帮助!我很感激。

我希望结果看起来像这样的示例图像。在这个示例中,每条线代表一个不同的列表/CSV表:
如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

houston_grade2020
NA
NA
B
A
NA
C
D
minneapolis_grade2020
A
NA
NA
B
C
C
D
houston_grade1990
B
B
B
A
A
C
D
minneapolis_grade1990
B
A
NA
A
NA
NA
D

等等。

(我开始时尝试使用一个CSV文件来尝试可视化,但遗憾的是它没有起作用。在这个示例中,我没有添加二进制列。)

# 按等级分组
Houston_2020_group <- 
  data.frame(
    values = c(Houston_2020_sub$houston_grade2020),
    group = c(rep("Houston 2020", nrow(Houston_2020_sub)))
  )

ggplot(data = Houston_2020_group, aes(x = values, y = group, fill = group)) +
  geom_line()+
  lab(title="HOLC等级")

结果:
如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

在这个示例中,我未能计算出每个等级出现的次数。对于最终结果,我希望所有的列表/CSV文件都能在图表中表示出来。

英文:

I am starting with 6 different lists/csvs that each contain one charater column. This column shows the Home Owner Loan Corporation (HOLC) neighborhood grades of census block groups. So the columns look something like shown below. I am new to using R studio and I am wondering if the first step would be to combine the lists. Another option could be to add a new binary column to each list that identifies a column as a 1 if it is not NA and 0 if it is. Then each of the lists can can condense into the categories A, B, C, D, and NA and the new binary column can be summed.

Ideally I am interested in using ggplot but I am open to other options. Thanks for your help! I appreciate it.

Example image of how I would like the results to look. In this example, each line represents a different list/csv table:
如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

houston_grade2020
NA
NA
B
A
NA
C
D
minneapolis_grade2020
A
NA
NA
B
C
C
D
houston_grade1990
B
B
B
A
A
C
D
minneapolis_grade1990
B
A
NA
A
NA
NA
D

etc.

(I started by working with one csv to try and visualize it but alas it did not work. In this example, I did not add the binary column.)

# Group by Grade
Houston_2020_group <- 
  data.frame(
    values = c(Houston_2020_sub$houston_grade2020),
    group = c(rep("Houston 2020", nrow(Houston_2020_sub)))
  )

ggplot(data = Houston_2020_group, aes(x = values, y = group, fill = group)) +
  geom_line()+
  lab(title="HOLC Grades")

results:
如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

In this example, I failed to sum the count of the appearances of each grade. For the final result I would like all lists/csvs to be represented in the graph.

答案1

得分: 1

你最大的挑战在于将你的数据重新排列成适合绘图的格式。基本上,你应该将所有的数据放入一个单一的数据框中,所有的成绩都放在一个单一的列中,并有第二列指示成绩来自哪个数据集。然后,你可以根据这第二列对数据进行分组,并计算每个成绩的数量。这样就可以轻松绘图:

library(tidyverse)

list(Houston_2020 = Houston_2020_sub, 
     Minneapolis_2020 = Minneapolis_2020_sub,
     Houston_1990 = Houston_1990_sub, 
     Minneapolis_1990 = Minneapolis_1990_sub) %>%
  lapply(function(x) setNames(x, 'grade')) %>%
  {do.call(bind_rows, c(., .id = 'group'))} %>%
  mutate(grade = factor(grade)) %>%
  group_by(group) %>%
  count(grade, .drop = FALSE) %>%
  ggplot(aes(grade, n, colour = group, group = group)) +
  geom_line() +
  geom_point(color = 'black') +
  facet_grid(group~.)

如果你想要所有的线都在同一个面板上,只需删除最后的 facet_grid 行。目前看起来很乱,因为你的数字很小。


数据以可重现的格式提取自问题

Houston_2020_sub <- data.frame(houston_grade2020 = c(NA, NA, 'B', 'A', 
                                                     NA, 'C', 'D'))

Minneapolis_2020_sub <- data.frame(minneapolis_grade2020 = c('A', NA, NA, "B", 
                                                             "C", "C", "D"))

Houston_1990_sub <- data.frame(houston_grade1990 = c('B', 'B', 'B', 'A', 'A', 
                                                     'C', 'D'))

Minneapolis_1990_sub <- data.frame(minneapolis_grade1990 = c('B', 'A', NA, 'A',
                                                             NA, NA, 'D'))
英文:

Your biggest challenge here is rearranging your data into an appropriate format for plotting. Essentially, you should get all your data in a single data frame, with all the grades in a single column, and have a second column indicating which data set the grades came from. Then you can group the data according to this second column and count the number of each grade. This then allows easy plotting:

library(tidyverse)

list(Houston_2020 = Houston_2020_sub, 
     Minneapolis_2020 = Minneapolis_2020_sub,
     Houston_1990 = Houston_1990_sub, 
     Minneapolis_1990 = Minneapolis_1990_sub) %&gt;%
  lapply(function(x) setNames(x, &#39;grade&#39;)) %&gt;%
  {do.call(bind_rows, c(., .id = &#39;group&#39;))} %&gt;%
  mutate(grade = factor(grade)) %&gt;%
  group_by(group) %&gt;%
  count(grade, .drop = FALSE) %&gt;%
  ggplot(aes(grade, n, colour = group, group = group)) +
  geom_line() +
  geom_point(color = &#39;black&#39;) +
  facet_grid(group~.)

如何使用geom+line和来自6个不同列表(CSV文件)的分类数据。

If you want all the lines on the same panel, just get rid of that final facet_grid line. It looks messy without this at present because your numbers are so small.


Data in reproducible format, taken from question

Houston_2020_sub &lt;- data.frame(houston_grade2020 = c(NA, NA, &#39;B&#39;, &#39;A&#39;, 
                                                     NA, &#39;C&#39;, &#39;D&#39;))

Minneapolis_2020_sub &lt;- data.frame(minneapolis_grade2020 = c(&#39;A&#39;, NA, NA, &quot;B&quot;, 
                                                             &quot;C&quot;, &quot;C&quot;, &quot;D&quot;))

Houston_1990_sub &lt;- data.frame(houston_grade1990 = c(&#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;A&#39;, &#39;A&#39;, 
                                                     &#39;C&#39;, &#39;D&#39;))

Minneapolis_1990_sub &lt;- data.frame(minneapolis_grade1990 = c(&#39;B&#39;, &#39;A&#39;, NA, &#39;A&#39;,
                                                             NA, NA, &#39;D&#39;))

huangapple
  • 本文由 发表于 2023年1月6日 10:44:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75026438.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定