英文:
Unable to convert to numbers from DataFrame columns when characters are appearing
问题
我有一个CSV文件,其中包含以度分格式表示的纬度和经度(包含度符号)。当我加载这个CSV文件时,度符号的位置出现了一个问号符号(?)。我尝试添加encoding='utf-8'和encoding='latin1',但仍然出现问号符号(?)。
在加载此文件后,我需要将这些纬度/经度点转换为度格式(不带任何符号)。正确的方法是什么?
输入文件的链接在这里。
英文:
I have a CSV file, which contains latitudes and longitudes in degree-minute format (contains degree symbol also). When I load this CSV file, a question mark symbol (?) is appearing in place of the degree symbol. I tried by adding encoding='utf-8' and encoding='latin1', still question mark symbol (?) is appearing.
After loading this file, I need to convert these latitude/longitude points to degree format (without any symbol). What is the correct way to do this?
Link to the input file is here
答案1
得分: 1
我可以使用encoding='latin1'
来读取你的文件。你可以使用以下方式将其转换为十进制度数:
df['LAT'] = df['LAT_DM'].str[:2].astype(int) + df['LAT_DM'].str[3:].astype(int) / 60
df['LONG'] = df['LONG'].str[:2].astype(int) + df['LONG_DM'].str[3:].astype(int) / 60
输出:
>>> df
LAT_DM LONG_DM LAT LONG
0 28°10 97°05 28.166667 97.083333
1 28°21 97°18 28.350000 97.300000
2 28°30 97°25 28.500000 97.416667
3 28°42 97°35 28.700000 97.583333
4 28°55 97°45 28.916667 97.750000
5 28°54 97°45 28.900000 97.750000
6 28°74 97°12 29.233333 97.200000
7 28°50 97°40 28.833333 97.666667
英文:
I can read your file using encoding='latin1'
. You can convert as decimal degrees in a vectorized way using:
df['LAT'] = df['LAT_DM'].str[:2].astype(int) + df['LAT_DM'].str[3:].astype(int).div(60)
df['LONG'] = df['LONG'].str[:2].astype(int) + df['LONG_DM'].str[3:].astype(int).div(60)
Output:
>>> df
LAT_DM LONG_DM LAT LONG
0 28°10 97°05 28.166667 97.083333
1 28°21 97°18 28.350000 97.300000
2 28°30 97°25 28.500000 97.416667
3 28°42 97°35 28.700000 97.583333
4 28°55 97°45 28.916667 97.750000
5 28°54 97°45 28.900000 97.750000
6 28°74 97°12 29.233333 97.200000
7 28°50 97°40 28.833333 97.666667
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论