英文:
How to parse ISO6709 coordinates in golang?
问题
以下是翻译好的内容:
维基百科上关于ISO 6709的一些示例:
<!-- language: lang-none -->
大西洋 +00-025/
法国 +46+002/
巴黎 +48.52+002.20/
埃菲尔铁塔 +48.8577+002.295/
珠穆朗玛峰 +27.5916+086.5640+8850CRSWGS_84/
北极 +90+000/
太平洋 +00-160/
南极 -90+000+2800CRSWGS_84/
美国 +38-097/
纽约市 +40.75-074.00/
自由女神像 +40.6894-074.0447/
由于没有一致的分隔字符,如何解析这些数据?使用正则表达式吗?逐字节读取和解析?
澄清一下:期望的输出是一对float32类型的纬度和经度。例如:
输入:+40.6894-074.0447/
输出:40.6894 和 -074.0447
英文:
Some examples from Wikipedia on ISO 6709:
<!-- language: lang-none -->
Atlantic Ocean +00-025/
France +46+002/
Paris +48.52+002.20/
Eiffel Tower +48.8577+002.295/
Mount Everest +27.5916+086.5640+8850CRSWGS_84/
North Pole +90+000/
Pacific Ocean +00-160/
South Pole -90+000+2800CRSWGS_84/
United States +38-097/
New York City +40.75-074.00/
Statue of Liberty +40.6894-074.0447/
What's the way to parse this since there's no consistent delimiting character? Regex? Read and parse it byte by byte?
To clarify: the desired output is a pair of float32 latitude and longitudes. So for e.g:
input: +40.6894-074.0447/
output: 40.6894 and -074.0447
答案1
得分: 4
我不确定你想要提取哪些部分,但以下正则表达式可以在你的示例中选择它们:
(\+|-)\d+\.?\d+(\+|-)\d+\.?[\d]+(\+|-)?[^/]*
它将工作在多个部分上,并依赖于最后的斜杠作为终止符,尽管如果没有斜杠,也有其他方法可以解决。
(\+|-)\d+\.?\d+(\+|-)\d+\.?\d+(\+|-)?[A-Z_\d]*
它不依赖于斜杠作为终止符。
为了提供一个完美的答案,需要提供坐标的上下文。
以下是实现的代码,给定一个字符串作为输入:
import (
"fmt"
"regexp"
)
func main() {
toSearch := "Atlantic Ocean +00-025/\nFrance +46+002/\nParis +48.52+002.20/\nEiffel Tower +48.8577+002.295/\nMount Everest +27.5916+086.5640+8850CRSWGS_84/\nNorth Pole +90+000/\nPacific Ocean +00-160/\nSouth Pole -90+000+2800CRSWGS_84/\nUnited States +38-097/\nNew York City +40.75-074.00/\nStatue of Liberty +40.6894-074.0447/"
ISOCoord := regexp.MustCompile(`(\+|-)\d+\.?\d+(\+|-)\d+\.?\d+(\+|-)?[A-Z_\d]*`)
result := ISOCoord.FindAll([]byte(toSearch), 11)
for _, v := range result {
fmt.Printf("%s\n", v)
}
}
返回:
+00-025
+46+002
+48.52+002.20
+48.8577+002.295
+27.5916+086.5640+8850CRSWGS_84
+90+000
+00-160
-90+000+2800CRSWGS_84
+38-097
+40.75-074.00
+40.6894-074.0447
根据你想要两个单独的坐标的新想法,这种方法可以工作:
import (
"fmt"
"regexp"
"strconv"
)
type coord struct {
lat, long float64
}
func main() {
toSearch := "Atlantic Ocean +00-025/\nFrance +46+002/\nParis +48.52+002.20/\nEiffel Tower +48.8577+002.295/\nMount Everest +27.5916+086.5640+8850CRSWGS_84/\nNorth Pole +90+000/\nPacific Ocean +00-160/\nSouth Pole -90+000+2800CRSWGS_84/\nUnited States +38-097/\nNew York City +40.75-074.00/\nStatue of Liberty +40.6894-074.0447/"
ISOCoord := regexp.MustCompile(`((\+|-)\d+\.?\d*){2}`)
result := ISOCoord.FindAllString(toSearch, -1)
INDCoord := regexp.MustCompile(`(\+|-)\d+\.?\d*`)
answer := make([]coord, 11)
for i, v := range result {
temp := INDCoord.FindAllString(v, 2)
lat, _ := strconv.ParseFloat(temp[0], 64)
lon, _ := strconv.ParseFloat(temp[1], 64)
answer[i] = coord{lat, lon}
}
fmt.Println(answer)
}
正则表达式重复两次,这样更加健壮,但如果可能的话,只执行一次会更快。
代码还应该对转换进行错误检查,但你可以添加它。
还值得注意的是,它会修剪掉0。如果你想保留它们,即如果012.1与12.1不同,你可以省略转换为浮点数并使用字符串进行操作。
代码生成浮点数:
[{0 -25} {46 2} {48.52 2.2} {48.8577 2.295} {27.5916 86.564} {90 0} {0 -160} {-90 0} {38 -97} {40.75 -74} {40.6894 -74.0447}]
或者作为字符串:
[{+00 -025} {+46 +002} {+48.52 +002.20} {+48.8577 +002.295} {+27.5916 +086.5640} {+90 +000} {+00 -160} {-90 +000} {+38 -097} {+40.75 -074.00} {+40.6894 -074.0447}]
英文:
I'm not sure which pieces you would want to extract but the following regex works within your examples to select them all.
(\+|-)\d+\.?\d+(\+|-)\d+\.?[\d]+(\+|-)?[^/]*
It does work it out in pieces, and depends on the last / as being a terminator, though if it isn't there would be other ways around it.
(\+|-)\d+\.?\d+(\+|-)\d+\.?\d+(\+|-)?[A-Z_\d]*
Does not rely on the / terminator.
To provide a perfect answer, the context of the coordinates would be required.
here is the code to implement, given a string as input:
import (
"fmt"
"regexp"
)
func main() {
toSearch := "Atlantic Ocean +00-025/\nFrance +46+002/\nParis +48.52+002.20/\nEiffel Tower +48.8577+002.295/\nMount Everest +27.5916+086.5640+8850CRSWGS_84/\nNorth Pole +90+000/\nPacific Ocean +00-160/\nSouth Pole -90+000+2800CRSWGS_84/\nUnited States +38-097/\nNew York City +40.75-074.00/\nStatue of Liberty +40.6894-074.0447/"
ISOCoord := regexp.MustCompile(`(\+|-)\d+\.?\d+(\+|-)\d+\.?\d+(\+|-)?[A-Z_\d]*`)
result := ISOCoord.FindAll([]byte(toSearch), 11)
for _, v := range result {
fmt.Printf("%s\n", v)
}
}
returns:
+00-025
+46+002
+48.52+002.20
+48.8577+002.295
+27.5916+086.5640+8850CRSWGS_84
+90+000
+00-160
-90+000+2800CRSWGS_84
+38-097
+40.75-074.00
+40.6894-074.0447
Given the new idea that you want the 2 sepearate coords, this approch works:
import (
"fmt"
"regexp"
"strconv"
)
type coord struct {
lat, long float64
}
func main() {
toSearch := "Atlantic Ocean +00-025/\nFrance +46+002/\nParis +48.52+002.20/\nEiffel Tower +48.8577+002.295/\nMount Everest +27.5916+086.5640+8850CRSWGS_84/\nNorth Pole +90+000/\nPacific Ocean +00-160/\nSouth Pole -90+000+2800CRSWGS_84/\nUnited States +38-097/\nNew York City +40.75-074.00/\nStatue of Liberty +40.6894-074.0447/"
ISOCoord := regexp.MustCompile(`((\+|-)\d+\.?\d*){2}`)
result := ISOCoord.FindAllString(toSearch, -1)
INDCoord := regexp.MustCompile(`(\+|-)\d+\.?\d*`)
answer := make([]coord, 11)
for i, v := range result {
temp := INDCoord.FindAllString(v, 2)
lat, _ := strconv.ParseFloat(temp[0], 64)
lon, _ := strconv.ParseFloat(temp[1], 64)
answer[i] = coord{lat, lon}
}
fmt.Println(answer)
}
The regex is doubled so that it is a little more robust, but it would be faster to only do it once, if it were possible given the input.
The code also should have error checking on the conversions, but you can add that.
Also worth noting that it trims 0's. If you want to maintain those as stated, ie. if 012.1 is not the same as 12.1, you could just leave out the conversion to float and work with the strings.
Code produces as float:
[{0 -25} {46 2} {48.52 2.2} {48.8577 2.295} {27.5916 86.564} {90 0} {0 -160} {-90 0} {38 -97} {40.75 -74} {40.6894 -74.0447}]
or
[{+00 -025} {+46 +002} {+48.52 +002.20} {+48.8577 +002.295} {+27.5916 +086.5640} {+90 +000} {+00 -160} {-90 +000} {+38 -097} {+40.75 -074.00} {+40.6894 -074.0447}]
as string
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论