英文:
Cloropleth using ggplot
问题
我一直在尝试绘制一个色彩地图。我取得了一些进展,但现在卡住了。这个想法是在巴西地图上标记每个州的暴力故意死亡数量。
我正在使用以下数据集:
State <- c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI", "TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP")
VID <- c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8)
VIDBrazil <- data.frame(State, VID)
我从这里获取巴西的geoJSON数据文件。
我正在尝试使用的代码基于这里找到的资源,特别是这里,这里,这里和这里。
我正在使用以下代码:
State <- c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI", "TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP")
VID <- c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8)
VIDBrazil <- data.frame(State, VID)
spdf_Brasil <- geojson_read("C:/Users/Giovanni/Desktop/R/brazil_geo.json", what = "sp") #导入JSON文件
spdf_Brasil_A <- tidy(spdf_Brasil) #转换为数据框
ggplot() + #绘制地图
geom_polygon(data = spdf_Brasil_A, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
theme_void() +
coord_map()
spdf_Brasil_MVI <- spdf_Brasil_A %>% #添加用于地图颜色编码的死亡值
left_join(. , VIDBrazil, by=c("id"="State"))
ggplot() + #带颜色编码的绘图
geom_polygon(data = spdf_Brasil_MVI, aes( x = long, y = lat, group = group, fill= VID), color="green") +
theme_void() +
scale_fill_gradient(name = "Mortes por 100 mil habitantes", low = "#fdc0b2", high = "#9a1f03", limits = c(0, 60.0)) +
coord_map()
我缺少什么?我应该怎么做才能得到地图的颜色编码?
谢谢你的帮助!
英文:
I've been trying to plot a cloropleth map. I've made some progress, but I'm stuck now.
The idea is to plot the map of Brazil and color code each state bases on the number of violent intentional deaths in 2021.
I'm using the following dataset:
State <- c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI","TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP")
VID <- c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8)
VIDBrazil <- data.frame(State, VID)
The geoJSON data file from Brazil I'm sourcing from this Brasil JSON file
The code I'm trying to use is based on resources found here, especially here, here, here and here
The code I'm using is the following:
State <- c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI","TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP")
VID <- c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8)
VIDBrazil <- data.frame(State, VID)
spdf_Brasil <- geojson_read("C:/Users/Giovanni/Desktop/R/brazil_geo.json", what = "sp") #Import JSON file
spdf_Brasil_A<- tidy(spdf_Brasil) #Convert to data frame
ggplot() + #plot the map
geom_polygon(data = spdf_Brasil_A, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
theme_void() +
coord_map()
spdf_Brasil_MVI <- spdf_Brasil_A %>% #add the values of deaths to use for color coding the map
left_join(. , VIDBrazil, by=c("id"="State"))
ggplot() + #ploting with color coding
geom_polygon(data = spdf_Brasil_MVI, aes( x = long, y = lat, group = group, fill= VID), color="green") +
theme_void() +
scale_fill_gradient(name = "Mortes por 100 mil habitantes", low = "#fdc0b2", high = "#9a1f03", limits = c(0, 60.0)) +
coord_map()
What am I missing? What should I do to get the map color coded?
Thank you for your assistance!
答案1
得分: 0
在broom::tidy()
中,GeoJSON中的id
字段未被处理,因此在生成的数据框中没有State代码,联接将无法按预期工作。尽管名为id
的列仍然存在,因此left_join()
不会引发错误。如果检查spdf_Brasil_MVI
,您可能会注意到所有VID
值都是NA
。而id
列包含存储为字符的数值,而不是State代码:
glimpse(spdf_Brasil_MVI)
#> 行: 581,207
#> 列: 8
#> $ long <dbl> -73.33251, -73.27482, -72.87016, -72.69151, -72.65935, -72.18967, -72.17318, -…
#> $ lat <dbl> -7.324879, -7.350334, -7.528882, -7.610621, -7.624979, -7.720489, -7.723841, -…
#> ...
#> $ id <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
#> $ VID <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
使用sf
包处理空间数据,并使用ggplot的geom_sf()
图层绘制sf对象可以简化这个问题:
library(sf)
library(ggplot2)
library(dplyr)
VIDBrazil <- data.frame(
State = c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI","TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP"),
VID = c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8))
sf_Brasil <- st_read("brazil_geo.json", as_tibble = TRUE, quiet = TRUE)
# sf对象,包含所有要素(id,name)+ geometry列和投影细节:
sf_Brasil
#> 简单特征集合,27个要素和2个字段
#> 几何类型: MULTIPOLYGON
#> 尺寸: XY
#> 边界框: xmin: -73.98971 ymin: -33.74708 xmax: -28.84694 ymax: 5.264878
#> 大地测量参考: WGS 84
#> # A tibble: 27 × 3
#> id name geometry
#> <chr> <chr> <MULTIPOLYGON [°]>
#> 1 AC Acre (((-73.33251 -7.324879, -73.27482 -7.350334, -72.8701…
#> 2 AL Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -35.9012…
#> 3 AP Amapá (((-50.02403 0.859862, -50.02403 0.859306, -50.02347 …
# ...
p1 <- ggplot(sf_Brasil) +
geom_sf(fill="#69b3a2", color="white") +
theme_void()
sf_Brasil_MVI <- left_join(sf_Brasil, VIDBrazil, by = c("id" = "State"))
# join result, includes VID values:
sf_Brasil_MVI
#> 简单特征集合,27个要素和3个字段
#> 几何类型: MULTIPOLYGON
#> 尺寸: XY
#> 边界框: xmin: -73.98971 ymin: -33.74708 xmax: -28.84694 ymax: 5.264878
#> 大地测量参考: WGS 84
#> # A tibble: 27 × 4
#> id name geometry VID
#> <chr> <chr> <MULTIPOLYGON [°]> <dbl>
#> 1 AC Acre (((-73.33251 -7.324879, -73.27482 -7.350334, -7… 21.2
#> 2 AL Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -3… 31.8
#> 3 AP Amapá (((-50.02403 0.859862, -50.02403 0.859306, -50.… 53.8
# ...
p2 <- ggplot(sf_Brasil_MVI) +
geom_sf(aes(fill = VID), color="green") +
scale_fill_gradient(name = "每十万人口的死亡人数", low = "#fdc0b2", high = "#9a1f03", limits = c(0, 60.0)) +
theme
<details>
<summary>英文:</summary>
`id` field in GeoJSON is not handled by `broom::tidy()` as-is, so there are no State codes in the resulting data frame and join will not work as intended. Though the column named `id` is still there, thus `left_join()` will not throw an error. If you check your `spdf_Brasil_MVI`, you may notice that all `VID` values are `NA`s. And instead of State codes, `id` column includes numeric values stored as characters:
``` r
glimpse(spdf_Brasil_MVI)
#> Rows: 581,207
#> Columns: 8
#> $ long <dbl> -73.33251, -73.27482, -72.87016, -72.69151, -72.65935, -72.18967, -72.17318, -…
#> $ lat <dbl> -7.324879, -7.350334, -7.528882, -7.610621, -7.624979, -7.720489, -7.723841, -…
#> ...
#> $ id <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
#> $ VID <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Using sf
package for handling spatial data and ggplot's geom_sf()
layer for plotting sf objects simplifies this quite a bit:
library(sf)
library(ggplot2)
library(dplyr)
VIDBrazil <- data.frame(
State = c("SP", "SC", "DF", "MG", "RS", "MS", "PR", "AC", "PI","TO", "MT", "RO", "GO", "RJ", "ES", "MA", "PB", "AL", "RN", "PA", "SE", "PE", "RR", "CE", "AM", "BA", "AP"),
VID = c(7.9, 10.1, 11.2, 11.4, 15.9, 20.7, 20.8, 21.2, 23.8, 24.3, 24.9, 25.0, 26.1, 27.2, 28.2, 28.3, 28.6, 31.8, 32.4, 32.8, 33.9, 34.8, 35.5, 37.0, 39.1, 44.9, 53.8))
sf_Brasil <- st_read("brazil_geo.json", as_tibble = TRUE, quiet = TRUE)
# sf object, a data.frame / tibble with all features (id, name) + geometry column
# and projection details:
sf_Brasil
#> Simple feature collection with 27 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -73.98971 ymin: -33.74708 xmax: -28.84694 ymax: 5.264878
#> Geodetic CRS: WGS 84
#> # A tibble: 27 × 3
#> id name geometry
#> <chr> <chr> <MULTIPOLYGON [°]>
#> 1 AC Acre (((-73.33251 -7.324879, -73.27482 -7.350334, -72.8701…
#> 2 AL Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -35.9012…
#> 3 AP Amapá (((-50.02403 0.859862, -50.02403 0.859306, -50.02347 …
# ...
p1 <- ggplot(sf_Brasil) +
geom_sf(fill="#69b3a2", color="white") +
theme_void()
sf_Brasil_MVI <- left_join(sf_Brasil, VIDBrazil, by = join_by(id == State))
# join result, includes VID values:
sf_Brasil_MVI
#> Simple feature collection with 27 features and 3 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -73.98971 ymin: -33.74708 xmax: -28.84694 ymax: 5.264878
#> Geodetic CRS: WGS 84
#> # A tibble: 27 × 4
#> id name geometry VID
#> <chr> <chr> <MULTIPOLYGON [°]> <dbl>
#> 1 AC Acre (((-73.33251 -7.324879, -73.27482 -7.350334, -7… 21.2
#> 2 AL Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -3… 31.8
#> 3 AP Amapá (((-50.02403 0.859862, -50.02403 0.859306, -50.… 53.8
# ...
p2 <- ggplot(sf_Brasil_MVI) +
geom_sf(aes(fill = VID), color="green") +
scale_fill_gradient(name = "Mortes por 100 mil habitantes", low = "#fdc0b2", high = "#9a1f03", limits = c(0, 60.0)) +
theme_void() +
theme(legend.position = "bottom")
p1;p2
<!-- --><!-- -->
For labelling sf objects we could use geom_sf_label()
or geom_sf_text()
, but when things can get bit crowded and it's not so easy to fit all labels, geom_label_repel()
& geom_text_repel()
form ggrepel
package are worth considering.
library(ggrepel)
# add layer to existing p2 plot
# get coordinates for geom_label_repel from sf_coordinates stats
# name : column in p2 data (sf_Brasil_MVI)
# geometry : geometry column of p2 data (sf_Brasil_MVI)
p2 + geom_label_repel(aes(label = name, geometry = geometry),
stat = "sf_coordinates")
<!-- -->
<sup>Created on 2023-03-12 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论