匹配邮政编码与纬度和经度(印度)

huangapple go评论74阅读模式
英文:

Matching pincodes with latitude and longitude (India)

问题

I am trying to get corresponding latitudes and longitudes for a particular pincode for India.

For pincode I have the following file.

https://data.gov.in/sites/default/files/all_india_PO_list_without_APS_offices_ver2_lat_long.csv

The data has 15 columns. I just show a part of it so that you can see how this data looks like.

officename pincode officeType Deliverystatus divisionname regionname

1: Achalapur B.O 504273 B.O Delivery Adilabad Hyderabad

2: Ada B.O 504293 B.O Delivery Adilabad Hyderabad

3: Adegaon B.O 504307 B.O Delivery Adilabad Hyderabad

4: Adilabad Collectorate S.O 504001 S.O Non-Delivery Adilabad Hyderabad

5: Adilabad H.O 504001 H.O Delivery Adilabad Hyderabad

This file has multiple lat-long mapped to one pincode.

For my use, I need to have one lat-long for a particular pincode (I have two address X and Y) and then I use Haversine to calculate the distance between X and Y

Possible options for me

  1. Take an average of lat-long for pincodes, then map them. Calculate Haversine distance between X and Y.
  2. Tried to use geocode.

I am getting this error mainly because I am behind office firewall

Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [maps.googleapis.com] Connection timed out after 10000 milliseconds

  1. Any other source on net or any other way to get 1:1 mapping between pincode and lat-long

Any help is appreciated!

英文:

I am trying to get corresponding latitudes and longitudes for a particular pincode for India.

For pincode I have the following file.

https://data.gov.in/sites/default/files/all_india_PO_list_without_APS_offices_ver2_lat_long.csv

The data has 15 columns. I just show a part of it so that you can see how this data looks like.

                   officename pincode officeType Deliverystatus divisionname   regionname
 1:             Achalapur B.O  504273        B.O       Delivery     Adilabad    Hyderabad
 2:                   Ada B.O  504293        B.O       Delivery     Adilabad    Hyderabad
 3:               Adegaon B.O  504307        B.O       Delivery     Adilabad    Hyderabad
 4: Adilabad Collectorate S.O  504001        S.O   Non-Delivery     Adilabad    Hyderabad
 5:              Adilabad H.O  504001        H.O       Delivery     Adilabad    Hyderabad

This file has multiple lat-long mapped to one pincode.

For my use, I need to have one lat-long for a particular pincode (I have two address X and Y) and then I use Haversine to calculate the distance between X and Y

Possible options for me

  1. Take an average of lat-long for pincodes, then map them. Calculate Haversine distance between X and Y.
  2. Tried to use geocode.

I am getting this error mainly because I am behind office firewall

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [maps.googleapis.com] Connection timed out after 10000 milliseconds
  1. Any other source on net or any other way to get 1:1 mapping between pincode and lat-long

Any help is appreciated!

答案1

得分: 1

以下是我为您翻译的代码部分:

这是我为您尝试的内容。您的数据在这里称为`mydf`。首先,获取具有`longitude`和`latitude`值的行。对于每个由`statename`和`pincode`定义的组,找到`longitude`和`latitude`的平均值。这将创建`out`。

library(dplyr)
library(tidyr)
library(purrr)

filter(mydf, complete.cases(latitude) & complete.cases(longitude)) %>%
group_by(statename, pincode) %>%
summarize(ave_long = mean(longitude),
ave_lat = mean(latitude)) -> foo

下一步是将`foo`按一种可以计算Haversine距离的方式排列。我找到了一种安排这些数据的好方法。请参考下面的链接。我们在这里创建所有可能的数据点组合。

#将这些数据按一种可以计算Haversine的方式排列。
#基本上我们创建了所有可能的行组合。
#这篇帖子帮助了我:https://community.rstudio.com/t/create-all-possible-combinations-of-a-data-frame/26848/4

myrows <- foo %>%
group_by_all() %>%
group_split()

out <- t(combn(x = 1:nrow(foo), m = 2)) %>%
as_tibble() %>%
mutate_all(~ map(., ~ pluck(myrows, .x))) %>%
unnest() %>%
setNames(nm = c("start_state", "start_pincode",
"start_long", "start_lat",
"dest_state", "dest_pincode",
"dest_long", "dest_lat"))

我们可以使用`distHaversine()`或`distGeo()`。但让我们尝试一些新的东西。SymbolixAU编写了另一个函数。谢谢你,SymbolixAU!

#https://stackoverflow.com/questions/36817423/how-to-efficiently-calculate-distance-between-pair-of-coordinates-using-data-tab/42014364#42014364

dt.haversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians <- pi/180
lat_to <- lat_to * radians
lat_from <- lat_from * radians
lon_to <- lon_to * radians
lon_from <- lon_from * radians
dLat <- (lat_to - lat_from)
dLon <- (lon_to - lon_from)
a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}

最后一步是计算距离。

mutate(out,
distance = dt.haversine(lon_from = start_long, lat_from = start_lat,
lon_to = dest_long, lat_to = dest_lat)) -> result

希望这可以帮助您理解代码的内容。如果您需要进一步的解释或有其他问题,请随时提出。

英文:

Here is what I tried for you. Your data is called mydf here. First, get rows that have values in longitude and latitude. For each group that is defined by statename and pincode, find average values for longitude and latitude. This creates out.

library(dplyr)
library(tidyr)
library(purrr)
filter(mydf, complete.cases(latitude) &amp; complete.cases(longitude)) %&gt;% 
group_by(statename, pincode) %&gt;% 
summarize(ave_long = mean(longitude),
ave_lat = mean(latitude)) -&gt; foo

Next step was to arrange foo in a way that we can calculate Haversine distance. I found a nice way to arrange this data. See the link below. We are creating all possible combinations of the data points here.

# Arrange this data in a way that we can calculate Haversine.
# We basically create all possible combinations of rows.
# This post gave me a hand: https://community.rstudio.com/t/create-all-possible-combinations-of-a-data-frame/26848/4
myrows &lt;- foo %&gt;%
group_by_all() %&gt;%
group_split()
out &lt;- t(combn(x = 1:nrow(foo), m = 2)) %&gt;%
as_tibble() %&gt;%
mutate_all(~ map(., ~ pluck(myrows, .x))) %&gt;% 
unnest() %&gt;% 
setNames(nm = c(&quot;start_state&quot;, &quot;start_pincode&quot;,
&quot;start_long&quot;, &quot;start_lat&quot;,
&quot;dest_state&quot;, &quot;dest_pincode&quot;,
&quot;dest_long&quot;, &quot;dest_lat&quot;))

We can use distHaversine() or distGeo(). But let's try something new. SymbolixAU wrote another function. Thank you, SymbolixAU!

# https://stackoverflow.com/questions/36817423/how-to-efficiently-calculate-distance-between-pair-of-coordinates-using-data-tab/42014364#42014364
dt.haversine &lt;- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians &lt;- pi/180
lat_to &lt;- lat_to * radians
lat_from &lt;- lat_from * radians
lon_to &lt;- lon_to * radians
lon_from &lt;- lon_from * radians
dLat &lt;- (lat_to - lat_from)
dLon &lt;- (lon_to - lon_from)
a &lt;- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}

The final step is to calculate distances.

mutate(out,
distance = dt.haversine(lon_from = start_long, lat_from = start_lat,
lon_to = dest_long, lat_to = dest_lat)) -&gt; result
# A tibble: 6,105 x 9
#   start_state start_pincode start_long start_lat dest_state dest_pincode dest_long dest_lat distance
#   &lt;chr&gt;               &lt;int&gt;      &lt;dbl&gt;     &lt;dbl&gt; &lt;chr&gt;             &lt;int&gt;     &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
# 1 KARNATAKA          560001       77.6      13.0 KARNATAKA        560003      77.6     13.0    3544.
# 2 KARNATAKA          560001       77.6      13.0 KARNATAKA        560004      77.6     12.9    4554.
# 3 KARNATAKA          560001       77.6      13.0 KARNATAKA        560005      77.6     13.0    3178.
# 4 KARNATAKA          560001       77.6      13.0 KARNATAKA        560008      77.6     13.0    4844.
# 5 KARNATAKA          560001       77.6      13.0 KARNATAKA        560010      77.6     13.0    4618.
# 6 KARNATAKA          560001       77.6      13.0 KARNATAKA        560011      77.6     12.9    5510.
# 7 KARNATAKA          560001       77.6      13.0 KARNATAKA        560013      77.6     13.1    9491.
# 8 KARNATAKA          560001       77.6      13.0 KARNATAKA        560014      77.5     13.1   12047.
# 9 KARNATAKA          560001       77.6      13.0 KARNATAKA        560017      77.7     13.0    6831.
#10 KARNATAKA          560001       77.6      13.0 KARNATAKA        560021      77.6     13.0    5148.

答案2

得分: -1

经纬度基础的距离永远不会与谷歌距离匹配,因为后者计算路径距离,而任何经纬度值之间的数学公式都将是一条直线(就像鸟飞行一样)。

英文:

Lat/Long based distances will never match with Google distances, since the latter calculates the path distance, whereas any mathematical formula between lat/long values will be a straight line (as the bird flies).

huangapple
  • 本文由 发表于 2020年1月6日 17:25:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/59609534.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定