2015年1月16日 00:37:23go评论84阅读模式

英文:

Reading tabular data with fixed width and missing values

问题

我正在尝试在Go中从磁盘中读取一个包含混合整数和浮点数的表格，每个字段的宽度是固定的（每个字段占据固定数量的位置，如果太短则前面有空格），并且某些值可能缺失（应默认为零）。

文件在这里：https://celestrak.org/SpaceData/sw20100101.txt

用于读取它的Fortran格式写在头部中：

FORMAT(I4,I3,I3,I5,I3,8I3,I4,8I4,I4,F4.1,I2,I4,F6.1,I2,5F6.1)

行的格式如下（最后几行有空格）：

2014 12 29 2475  2 20 30 23 33 37 47 33 47 270   7  15   9  18  22  39  18  39  21 1.1 5  64 127.1 0 150.4 156.0 131.4 153.3 160.9
2014 12 30 2475  3 30 40 37 20 30 27 27 23 233  15  27  22   7  15  12  12   9  15 0.8 4  66 126.0 0 150.3 156.1 130.3 152.7 161.0
2014 12 31 2475  4 13 23 13 17 20 33 13 17 150   5   9   5   6   7  18   5   6   8 0.4 2  65 129.2 0 150.5 156.3 133.6 152.4 161.3
2015 01 01 2475  5 20 10 10 10 10 20 20 30 130   7   4   4   4   4   7   7  15   6       101 138.0 0 150.7 156.6 142.7 152.1 161.7
2015 01 02 2475  6 30 10 20 20 30 20 30 40 200  15   4   7   7  15   7  15  27  12       113 146.0 0 150.9 157.0 151.0 152.2 162.1
2015 01 03 2475  7 50 30 30 30 30 20 20 10 220  48  15  15  15  15   7   7   4  15       122 149.0 0 151.0 157.2 154.1 152.4 162.4

我一直在尝试使用聪明的格式字符串与Sscanf一起使用（例如"%4d%3d%3d%5d..."），但它无法处理空格，或者如果数字没有正确对齐到其位置。

我正在寻找一种像Fortran那样读取它的方法，其中：

可能存在混合字段类型（整数、浮点数、字符串）。
每列在字符中有固定的大小，如果需要，用空格填充槽位，但不同的列可能有不同的大小。
数值可能以零开头。
值可能缺失，在这种情况下，它给出其零值。
值可能位于槽位中的任何位置，不一定是右对齐的（不是示例，但可能是可能的）

是否有一种聪明的方法来读取这样的内容，还是我应该手动拆分、修剪、检查和转换每个字段？

英文:

I'm trying to read a table from disk in Go, with mixed integers and floats, where the width of each field is fixed (every field occupies a fixed number of places, preceded by blanks if too short) and where some values may be missing (and should default to zero).

The file is here: https://celestrak.org/SpaceData/sw20100101.txt

The Fortran format used to read it is written in the header:

FORMAT(I4,I3,I3,I5,I3,8I3,I4,8I4,I4,F4.1,I2,I4,F6.1,I2,5F6.1)

and the lines looks like this (some of the last lines, with blanks):

2014 12 29 2475  2 20 30 23 33 37 47 33 47 270   7  15   9  18  22  39  18  39  21 1.1 5  64 127.1 0 150.4 156.0 131.4 153.3 160.9
2014 12 30 2475  3 30 40 37 20 30 27 27 23 233  15  27  22   7  15  12  12   9  15 0.8 4  66 126.0 0 150.3 156.1 130.3 152.7 161.0
2014 12 31 2475  4 13 23 13 17 20 33 13 17 150   5   9   5   6   7  18   5   6   8 0.4 2  65 129.2 0 150.5 156.3 133.6 152.4 161.3
2015 01 01 2475  5 20 10 10 10 10 20 20 30 130   7   4   4   4   4   7   7  15   6       101 138.0 0 150.7 156.6 142.7 152.1 161.7
2015 01 02 2475  6 30 10 20 20 30 20 30 40 200  15   4   7   7  15   7  15  27  12       113 146.0 0 150.9 157.0 151.0 152.2 162.1
2015 01 03 2475  7 50 30 30 30 30 20 20 10 220  48  15  15  15  15   7   7   4  15       122 149.0 0 151.0 157.2 154.1 152.4 162.4

I have been trying a clever format string to use with Sscanf (like "%4d%3d%3d%5d...") but it won't work with blanks, or if the number is not right-aligned to its slot.

I'm looking a way to read it like in Fortran, where:

Mixed field types (integers, floats, strings) are possible.
Each column have a fixed size in characters, filling the slot with blanks if necessary, but different columns may have a different size.
Numeric values may be preceded by zeros.
Values may be missing, in that case, it gives its zero value.
Values may be in any position in the slot, not necessarily right-aligned (not the example but it could be possible)

Is there a clever method to read something like this or should I split, trim, check and convert manually every field?

答案1

得分: 2

package main

import "fmt"
import "reflect"
import "strconv"
import "strings"

type scanner struct {
	len   int
	parts []int
}

func (ss *scanner) Scan(s string, args ...interface{}) (n int, err error) {
	if i := len(s); i != ss.len {
		return 0, fmt.Errorf("期望字符串长度为 %d，实际为 %d", ss.len, i)
	}
	if len(args) != len(ss.parts) {
		return 0, fmt.Errorf("期望 %d 个参数，实际为 %d", len(ss.parts), len(args))
	}
	n = 0
	start := 0
	for ; n < len(args); n++ {
		a := args[n]
		l := ss.parts[n]
		if err = scanOne(s[start:start+l], a); err != nil {
			return
		}
		start += l
	}
	return n, nil
}

func newScan(parts ...int) *scanner {
	len := 0
	for _, v := range parts {
		len += v
	}
	return &scanner{len, parts}
}

func scanOne(s string, arg interface{}) (err error) {
	s = strings.TrimSpace(s)
	switch v := arg.(type) {
	case *int:
		if s == "" {
			*v = int(0)
		} else {
			*v, err = strconv.Atoi(s)
		}
	case *int32:
		if s == "" {
			*v = int32(0)
		} else {
			var val int64
			val, err = strconv.ParseInt(s, 10, 32)
			*v = int32(val)
		}
	case *int64:
		if s == "" {
			*v = int64(0)
		} else {
			*v, err = strconv.ParseInt(s, 10, 64)
		}
	case *float32:
		if s == "" {
			*v = float32(0)
		} else {
			var val float64
			val, err = strconv.ParseFloat(s, 32)
			*v = float32(val)
		}
	case *float64:
		if s == "" {
			*v = float64(0)
		} else {
			*v, err = strconv.ParseFloat(s, 64)
		}
	default:
		val := reflect.ValueOf(v)
		err = fmt.Errorf("无法解析类型: " + val.Type().String())
	}
	return
}

func main() {
	s := newScan(2, 4, 2)
	var a int
	var b float32
	var c int32

	s.Scan("12 2.2 1", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)

	s.Scan("1      2", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)

	s.Scan("        ", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)
}

输出：

12 2.200000 1
1 0.000000 1
0 0.000000 0

注意，Scan 函数返回解析的参数数量 n 和错误 err。如果值缺失，函数将将其设置为 0。该实现大部分来自 fmt.Scanf。

英文:

package main
import &quot;fmt&quot;
import &quot;reflect&quot;
import &quot;strconv&quot;
import &quot;strings&quot;
type scanner struct {
len   int
parts []int
}
func (ss *scanner) Scan(s string, args ...interface{}) (n int, err error) {
if i := len(s); i != ss.len {
return 0, fmt.Errorf(&quot;exepected string of size %d, actual %d&quot;, ss.len, i)
}
if len(args) != len(ss.parts) {
return 0, fmt.Errorf(&quot;expected %d args, actual %d&quot;, len(ss.parts), len(args))
}
n = 0
start := 0
for ; n &lt; len(args); n++ {
a := args[n]
l := ss.parts[n]
if err = scanOne(s[start:start+l], a); err != nil {
return
}
start += l
}
return n, nil
}
func newScan(parts ...int) *scanner {
len := 0
for _, v := range parts {
len += v
}
return &amp;scanner{len, parts}
}
func scanOne(s string, arg interface{}) (err error) {
s = strings.TrimSpace(s)
switch v := arg.(type) {
case *int:
if s == &quot;&quot; {
*v = int(0)
} else {
*v, err = strconv.Atoi(s)
}
case *int32:
if s == &quot;&quot; {
*v = int32(0)
} else {
var val int64
val, err = strconv.ParseInt(s, 10, 32)
*v = int32(val)
}
case *int64:
if s == &quot;&quot; {
*v = int64(0)
} else {
*v, err = strconv.ParseInt(s, 10, 64)
}
case *float32:
if s == &quot;&quot; {
*v = float32(0)
} else {
var val float64
val, err = strconv.ParseFloat(s, 32)
*v = float32(val)
}
case *float64:
if s == &quot;&quot; {
*v = float64(0)
} else {
*v, err = strconv.ParseFloat(s, 64)
}
default:
val := reflect.ValueOf(v)
err = fmt.Errorf(&quot;can&#39;t scan type: &quot; + val.Type().String())
}
return
}
func main() {
s := newScan(2, 4, 2)
var a int
var b float32
var c int32
s.Scan(&quot;12 2.2 1&quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
s.Scan(&quot;1      2&quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
s.Scan(&quot;        &quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
}

Output:

12 2.200000 1
1 0.000000 1
0 0.000000 0

Notice that Scan function returns n - number of parsed arguments and err. If value is missing the function will set it to 0. The implementation is mostly taken from fmt.Scanf.

答案2

得分: 0

你可以使用空格作为分隔符来进行CSV编码。类似这样的代码：

import (
    "encoding/csv"
    "os"
)

file, _ := os.Open("/SpaceData/sw20100101.txt")
csvreader := csv.NewReader(file)
csvreader.Comma = ' '
csvreader.FieldsPerRecord = 33
csvreader.TrimLeadingSpace = true
parsedout, _ := csvreader.Read()

这里有一个可工作的示例：https://play.golang.org/p/Tsp72D4vsR

英文:

You can employ csv encoding with delimiter set to blankspace. Something like this

import (
&quot;encoding/csv&quot;
&quot;os&quot;
)
file, _:=os.Open(&quot;/SpaceData/sw20100101.txt&quot;)
csvreader:=csv.NewReader(file)
csvreader.Comma=&#39; &#39;
csvreader.FieldsPerRecord=33
csvreader.TrimLeadingSpace=true
parsedout, _ := csvreader.Read()

here is working example https://play.golang.org/p/Tsp72D4vsR

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用固定宽度和缺失值读取表格数据

问题

答案1

答案2

向多个设备发送SNS推送通知消息

Group没有实现Data（FooMethod方法具有指针接收器）

传递映射、切片到通道和网络上？

创建可重用的名称的非常规Go构造：name := name …. Somethiing

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论