英文:
how to keep continental U.S.A. from shapefile at the zip code level?
问题
我已经从人口普查局下载了大型.shapefile
文件,以邮政编码
级别进行了压缩。
链接在这里:cb_2017_us_zcta510_500k.shp (https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/ZCTA520/)
问题是,通过geopandas
读取后,显然包括了阿拉斯加和所有小岛屿周围。
gg.head(1)
Out[709]:
ZCTA5CE20 GEOID20 CLASSFP20 MTFCC20 FUNCSTAT20 ALAND20 \
0 35592 35592 B5 G6350 S 298552385
AWATER20 INTPTLAT20 INTPTLON20 \
0 235989 +33.7427261 -088.0973903
geometry
0 POLYGON ((-88.24735 33.65390, -88.24713 33.65415, -88.24656 33.65454, -88.24658 33.65479, -88.24672 33.65497, -88.24672 33.65520, -88.24626 33.65559, -88.24601 33.65591, -88.24601 33.65630, -88.24...
我知道在R中有一个简单的解决方案(使用多边形的面积,请参考https://stackoverflow.com/questions/50375619/how-to-remove-all-the-small-islands-from-the-census-shapefile-zip-code-level),但在Python中应该怎么做呢?
谢谢!
英文:
I have downloaded the large .shapefile
at the zip code
level from Census.
The link is here : cb_2017_us_zcta510_500k.shp (https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/ZCTA520/)
The problem is that reading into geopandas
shows that, obviously, it includes alaska and all the small island around.
gg.head(1)
Out[709]:
ZCTA5CE20 GEOID20 CLASSFP20 MTFCC20 FUNCSTAT20 ALAND20 \
0 35592 35592 B5 G6350 S 298552385
AWATER20 INTPTLAT20 INTPTLON20 \
0 235989 +33.7427261 -088.0973903
geometry
0 POLYGON ((-88.24735 33.65390, -88.24713 33.65415, -88.24656 33.65454, -88.24658 33.65479, -88.24672 33.65497, -88.24672 33.65520, -88.24626 33.65559, -88.24601 33.65591, -88.24601 33.65630, -88.24...
I know there is an easy solution in R (that uses the area of a polygon, see https://stackoverflow.com/questions/50375619/how-to-remove-all-the-small-islands-from-the-census-shapefile-zip-code-level) but what can I do here in Python?
Thanks!
答案1
得分: 1
这可以通过使用CONUS形状定义文件来完成;然而,美国大陆具有一个方便的特性,即它位于一个边界框内(而所有非CONUS地理位置都位于其外部)。所以最简单的方法是使用边界框进行筛选:
# 宽容的边界框
x1, y1, x2, y2 = (-130, 20, -50, 50)
gg_wgs84 = gg.to_crs('epsg:4326')
gg_conus = gg[
(gg_wgs84.centroid.x > x1)
& (gg_wgs84.centroid.y > y1)
& (gg_wgs84.centroid.x < x2)
& (gg_wgs84.centroid.y < y2)
]
英文:
This can certainly be done using a CONUS shape definition file; however, the continental US has the convenient property of falling within a bounding box (and all non-CONUS geographies fall out of it). So the easiest way would be to filter using a bounding box:
# generous bounding box
x1, y1, x2, y2 = (-130, 20, -50, 50)
gg_wgs84 = gg.to_crs('epsg:4326')
gg_conus = gg[
(gg_wgs84.centroid.x > x1)
& (gg_wgs84.centroid.y > y1)
& (gg_wgs84.centroid.x < x2)
& (gg_wgs84.centroid.y < y2)
]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论