geopandas 在保存到文件时重命名列名

huangapple go评论56阅读模式
英文:

geopandas renaming columns when saving to file

问题

我试图使用GeoPandas将一个形状文件保存到本地,最好是一个压缩文件,但我已经尝试了压缩和未压缩的方法。我注意到在将文件保存到本地后,然后再读取文件时,有三列发生了变化,最重要的是 'geom' 变回了 'geometry','parcel_apn_2' 现在是 'parcel_a_1','fips_county' 现在是 'fips_count'。我是否漏掉了可能导致这种行为的东西?

在保存之前检查列名:

# shp_prior_to_writing 是原始的 GeoDataFrame 
shp_prior_to_writing.columns

返回...

Index(['xref_id', 'fips_state', 'fips_county', 'county', 'parcel_apn',
       'parcel_apn_2', 'address', 'city', 'state', 'zip', 'src_id', 'latitude',
       'longitude', 'geom'],
      dtype='object')

然后将相同的文件保存到本地...

shp_prior_to_writing.to_file('test_shp.shp', driver='ESRI Shapefile')

然后读取它...

same_shape_file = gpd.read_file('test_shp.shp')
same_shape_file.columns

返回...

Index(['xref_id', 'fips_state', 'fips_count', 'county', 'parcel_apn',
       'parcel_a_1', 'address', 'city', 'state', 'zip', 'src_id', 'latitude',
       'longitude', 'geometry'],
      dtype='object')

我尝试过压缩和未压缩。我尝试过不明确设置任何驱动程序(我相信它默认为 ESRI Shapefile),我尝试过重新启动我的笔记本中的 Jupyter 内核。我还尝试过在保存之前明确重命名这些列,但结果似乎总是一样的。

英文:

I am trying to save a shape file locally with GeoPandas, preferably as a zipped file, however I have tried both compressed and uncompressed methods. I'm noticing that after saving the file locally, then reading the file back in, three columns have changed, most importantly 'geom' has reverted back to 'geometry', 'parcel_apn_2' is now 'parcel_a_1', and 'fips_county' is now 'fips_count'. Am I missing something that would cause this behavior?

Checking the column names prior to saving:

# shp_prior_to_writing is the original GeoDataFrame 
shp_prior_to_writing.columns

returns...

Index(['xref_id', 'fips_state', 'fips_county', 'county', 'parcel_apn',
       'parcel_apn_2', 'address', 'city', 'state', 'zip', 'src_id', 'latitude',
       'longitude', 'geom'],
      dtype='object')

then writing the same file locally...

shp_prior_to_writing.to_file('test_shp.shp', driver='ESRI Shapefile')

and reading it back in...

same_shape_file=gpd.read_file('test_shp.shp')
same_shape_file.columns

returns...

Index(['xref_id', 'fips_state', 'fips_count', 'county', 'parcel_apn',
       'parcel_a_1', 'address', 'city', 'state', 'zip', 'src_id', 'latitude',
       'longitude', 'geometry'],
      dtype='object')

I've tried zipping vs. uncompressed. I've tried without explicitly setting any drivers (I believe it defaults to ESRI Shapefile anyways), I've tried restarting the Jupyter kernel in my notebook. I've tried explicitly renaming those columns again prior to saving as well, but the result appears to always be the same.

答案1

得分: 0

Shapefile格式对列名有一个硬性限制,限制为10个字符。这个限制是内建在格式规范中的,来自ESRI,并不是geopandas或提供shp驱动的Fiona的问题。

请查看维基百科上对ESRI Shapefile标准限制的讨论,其中列出了这个10字符限制。还可以参考GIS StackExchange:如何绕过Shapefile中字段名的10字符限制?来了解一些选项的讨论。

由于这个10字符的限制,geopandas必须在重写列之前重命名它们,这导致了您看到的名称更改。如果您希望继续使用这些列名并让它们在磁盘上往返,请您需要使用不同的文件格式。

英文:

The shapefile format has a hard limit on column names of 10 characters. That limit is baked into the format specification and comes from ESRI, and is not the fault of geopandas or Fiona which provides the shp driver.

See e.g. a discussion of the ESRI Shapefile standard’s limitations on Wikipedia, which lists the 10-character limit. Also see GIS StackExchange: Bypassing 10 character limit of field name in shapefiles? for a discussion of options.

Because of this 10-character limit, geopandas must rename your columns before they can be rewritten, which is resulting in the name change you are seeing. If you want to continue to use these column names and have them round trip to disk, you will need to use a different file format.

huangapple
  • 本文由 发表于 2023年3月4日 04:34:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75631646.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定