英文:
Matching pincodes with latitude and longitude (India)
问题
I am trying to get corresponding latitudes and longitudes for a particular pincode for India.
For pincode I have the following file.
https://data.gov.in/sites/default/files/all_india_PO_list_without_APS_offices_ver2_lat_long.csv
The data has 15 columns. I just show a part of it so that you can see how this data looks like.
officename pincode officeType Deliverystatus divisionname regionname
1: Achalapur B.O 504273 B.O Delivery Adilabad Hyderabad
2: Ada B.O 504293 B.O Delivery Adilabad Hyderabad
3: Adegaon B.O 504307 B.O Delivery Adilabad Hyderabad
4: Adilabad Collectorate S.O 504001 S.O Non-Delivery Adilabad Hyderabad
5: Adilabad H.O 504001 H.O Delivery Adilabad Hyderabad
This file has multiple lat-long mapped to one pincode.
For my use, I need to have one lat-long for a particular pincode (I have two address X and Y) and then I use Haversine to calculate the distance between X and Y
Possible options for me
- Take an average of lat-long for pincodes, then map them. Calculate Haversine distance between X and Y.
- Tried to use geocode.
I am getting this error mainly because I am behind office firewall
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [maps.googleapis.com] Connection timed out after 10000 milliseconds
- Any other source on net or any other way to get 1:1 mapping between pincode and lat-long
Any help is appreciated!
英文:
I am trying to get corresponding latitudes and longitudes for a particular pincode for India.
For pincode I have the following file.
https://data.gov.in/sites/default/files/all_india_PO_list_without_APS_offices_ver2_lat_long.csv
The data has 15 columns. I just show a part of it so that you can see how this data looks like.
officename pincode officeType Deliverystatus divisionname regionname
1: Achalapur B.O 504273 B.O Delivery Adilabad Hyderabad
2: Ada B.O 504293 B.O Delivery Adilabad Hyderabad
3: Adegaon B.O 504307 B.O Delivery Adilabad Hyderabad
4: Adilabad Collectorate S.O 504001 S.O Non-Delivery Adilabad Hyderabad
5: Adilabad H.O 504001 H.O Delivery Adilabad Hyderabad
This file has multiple lat-long mapped to one pincode.
For my use, I need to have one lat-long for a particular pincode (I have two address X and Y) and then I use Haversine to calculate the distance between X and Y
Possible options for me
- Take an average of lat-long for pincodes, then map them. Calculate Haversine distance between X and Y.
- Tried to use geocode.
I am getting this error mainly because I am behind office firewall
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [maps.googleapis.com] Connection timed out after 10000 milliseconds
- Any other source on net or any other way to get 1:1 mapping between pincode and lat-long
Any help is appreciated!
答案1
得分: 1
以下是我为您翻译的代码部分:
这是我为您尝试的内容。您的数据在这里称为`mydf`。首先,获取具有`longitude`和`latitude`值的行。对于每个由`statename`和`pincode`定义的组,找到`longitude`和`latitude`的平均值。这将创建`out`。
library(dplyr)
library(tidyr)
library(purrr)
filter(mydf, complete.cases(latitude) & complete.cases(longitude)) %>%
group_by(statename, pincode) %>%
summarize(ave_long = mean(longitude),
ave_lat = mean(latitude)) -> foo
下一步是将`foo`按一种可以计算Haversine距离的方式排列。我找到了一种安排这些数据的好方法。请参考下面的链接。我们在这里创建所有可能的数据点组合。
#将这些数据按一种可以计算Haversine的方式排列。
#基本上我们创建了所有可能的行组合。
#这篇帖子帮助了我:https://community.rstudio.com/t/create-all-possible-combinations-of-a-data-frame/26848/4
myrows <- foo %>%
group_by_all() %>%
group_split()
out <- t(combn(x = 1:nrow(foo), m = 2)) %>%
as_tibble() %>%
mutate_all(~ map(., ~ pluck(myrows, .x))) %>%
unnest() %>%
setNames(nm = c("start_state", "start_pincode",
"start_long", "start_lat",
"dest_state", "dest_pincode",
"dest_long", "dest_lat"))
我们可以使用`distHaversine()`或`distGeo()`。但让我们尝试一些新的东西。SymbolixAU编写了另一个函数。谢谢你,SymbolixAU!
#https://stackoverflow.com/questions/36817423/how-to-efficiently-calculate-distance-between-pair-of-coordinates-using-data-tab/42014364#42014364
dt.haversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians <- pi/180
lat_to <- lat_to * radians
lat_from <- lat_from * radians
lon_to <- lon_to * radians
lon_from <- lon_from * radians
dLat <- (lat_to - lat_from)
dLon <- (lon_to - lon_from)
a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}
最后一步是计算距离。
mutate(out,
distance = dt.haversine(lon_from = start_long, lat_from = start_lat,
lon_to = dest_long, lat_to = dest_lat)) -> result
希望这可以帮助您理解代码的内容。如果您需要进一步的解释或有其他问题,请随时提出。
英文:
Here is what I tried for you. Your data is called mydf
here. First, get rows that have values in longitude
and latitude
. For each group that is defined by statename
and pincode
, find average values for longitude
and latitude
. This creates out
.
library(dplyr)
library(tidyr)
library(purrr)
filter(mydf, complete.cases(latitude) & complete.cases(longitude)) %>%
group_by(statename, pincode) %>%
summarize(ave_long = mean(longitude),
ave_lat = mean(latitude)) -> foo
Next step was to arrange foo
in a way that we can calculate Haversine distance. I found a nice way to arrange this data. See the link below. We are creating all possible combinations of the data points here.
# Arrange this data in a way that we can calculate Haversine.
# We basically create all possible combinations of rows.
# This post gave me a hand: https://community.rstudio.com/t/create-all-possible-combinations-of-a-data-frame/26848/4
myrows <- foo %>%
group_by_all() %>%
group_split()
out <- t(combn(x = 1:nrow(foo), m = 2)) %>%
as_tibble() %>%
mutate_all(~ map(., ~ pluck(myrows, .x))) %>%
unnest() %>%
setNames(nm = c("start_state", "start_pincode",
"start_long", "start_lat",
"dest_state", "dest_pincode",
"dest_long", "dest_lat"))
We can use distHaversine() or distGeo(). But let's try something new. SymbolixAU wrote another function. Thank you, SymbolixAU!
# https://stackoverflow.com/questions/36817423/how-to-efficiently-calculate-distance-between-pair-of-coordinates-using-data-tab/42014364#42014364
dt.haversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians <- pi/180
lat_to <- lat_to * radians
lat_from <- lat_from * radians
lon_to <- lon_to * radians
lon_from <- lon_from * radians
dLat <- (lat_to - lat_from)
dLon <- (lon_to - lon_from)
a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}
The final step is to calculate distances.
mutate(out,
distance = dt.haversine(lon_from = start_long, lat_from = start_lat,
lon_to = dest_long, lat_to = dest_lat)) -> result
# A tibble: 6,105 x 9
# start_state start_pincode start_long start_lat dest_state dest_pincode dest_long dest_lat distance
# <chr> <int> <dbl> <dbl> <chr> <int> <dbl> <dbl> <dbl>
# 1 KARNATAKA 560001 77.6 13.0 KARNATAKA 560003 77.6 13.0 3544.
# 2 KARNATAKA 560001 77.6 13.0 KARNATAKA 560004 77.6 12.9 4554.
# 3 KARNATAKA 560001 77.6 13.0 KARNATAKA 560005 77.6 13.0 3178.
# 4 KARNATAKA 560001 77.6 13.0 KARNATAKA 560008 77.6 13.0 4844.
# 5 KARNATAKA 560001 77.6 13.0 KARNATAKA 560010 77.6 13.0 4618.
# 6 KARNATAKA 560001 77.6 13.0 KARNATAKA 560011 77.6 12.9 5510.
# 7 KARNATAKA 560001 77.6 13.0 KARNATAKA 560013 77.6 13.1 9491.
# 8 KARNATAKA 560001 77.6 13.0 KARNATAKA 560014 77.5 13.1 12047.
# 9 KARNATAKA 560001 77.6 13.0 KARNATAKA 560017 77.7 13.0 6831.
#10 KARNATAKA 560001 77.6 13.0 KARNATAKA 560021 77.6 13.0 5148.
答案2
得分: -1
经纬度基础的距离永远不会与谷歌距离匹配,因为后者计算路径距离,而任何经纬度值之间的数学公式都将是一条直线(就像鸟飞行一样)。
英文:
Lat/Long based distances will never match with Google distances, since the latter calculates the path distance, whereas any mathematical formula between lat/long values will be a straight line (as the bird flies).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论