使用固定宽度和缺失值读取表格数据

huangapple go评论84阅读模式
英文:

Reading tabular data with fixed width and missing values

问题

我正在尝试在Go中从磁盘中读取一个包含混合整数和浮点数的表格,每个字段的宽度是固定的(每个字段占据固定数量的位置,如果太短则前面有空格),并且某些值可能缺失(应默认为零)。

文件在这里:https://celestrak.org/SpaceData/sw20100101.txt

用于读取它的Fortran格式写在头部中:

FORMAT(I4,I3,I3,I5,I3,8I3,I4,8I4,I4,F4.1,I2,I4,F6.1,I2,5F6.1)

行的格式如下(最后几行有空格):

2014 12 29 2475  2 20 30 23 33 37 47 33 47 270   7  15   9  18  22  39  18  39  21 1.1 5  64 127.1 0 150.4 156.0 131.4 153.3 160.9
2014 12 30 2475  3 30 40 37 20 30 27 27 23 233  15  27  22   7  15  12  12   9  15 0.8 4  66 126.0 0 150.3 156.1 130.3 152.7 161.0
2014 12 31 2475  4 13 23 13 17 20 33 13 17 150   5   9   5   6   7  18   5   6   8 0.4 2  65 129.2 0 150.5 156.3 133.6 152.4 161.3
2015 01 01 2475  5 20 10 10 10 10 20 20 30 130   7   4   4   4   4   7   7  15   6       101 138.0 0 150.7 156.6 142.7 152.1 161.7
2015 01 02 2475  6 30 10 20 20 30 20 30 40 200  15   4   7   7  15   7  15  27  12       113 146.0 0 150.9 157.0 151.0 152.2 162.1
2015 01 03 2475  7 50 30 30 30 30 20 20 10 220  48  15  15  15  15   7   7   4  15       122 149.0 0 151.0 157.2 154.1 152.4 162.4

我一直在尝试使用聪明的格式字符串与Sscanf一起使用(例如"%4d%3d%3d%5d..."),但它无法处理空格,或者如果数字没有正确对齐到其位置。

我正在寻找一种像Fortran那样读取它的方法,其中:

  • 可能存在混合字段类型(整数、浮点数、字符串)。
  • 每列在字符中有固定的大小,如果需要,用空格填充槽位,但不同的列可能有不同的大小。
  • 数值可能以零开头。
  • 值可能缺失,在这种情况下,它给出其零值。
  • 值可能位于槽位中的任何位置,不一定是右对齐的(不是示例,但可能是可能的)

是否有一种聪明的方法来读取这样的内容,还是我应该手动拆分、修剪、检查和转换每个字段?

英文:

I'm trying to read a table from disk in Go, with mixed integers and floats, where the width of each field is fixed (every field occupies a fixed number of places, preceded by blanks if too short) and where some values may be missing (and should default to zero).

The file is here: https://celestrak.org/SpaceData/sw20100101.txt

The Fortran format used to read it is written in the header:

FORMAT(I4,I3,I3,I5,I3,8I3,I4,8I4,I4,F4.1,I2,I4,F6.1,I2,5F6.1)

and the lines looks like this (some of the last lines, with blanks):

2014 12 29 2475  2 20 30 23 33 37 47 33 47 270   7  15   9  18  22  39  18  39  21 1.1 5  64 127.1 0 150.4 156.0 131.4 153.3 160.9
2014 12 30 2475  3 30 40 37 20 30 27 27 23 233  15  27  22   7  15  12  12   9  15 0.8 4  66 126.0 0 150.3 156.1 130.3 152.7 161.0
2014 12 31 2475  4 13 23 13 17 20 33 13 17 150   5   9   5   6   7  18   5   6   8 0.4 2  65 129.2 0 150.5 156.3 133.6 152.4 161.3
2015 01 01 2475  5 20 10 10 10 10 20 20 30 130   7   4   4   4   4   7   7  15   6       101 138.0 0 150.7 156.6 142.7 152.1 161.7
2015 01 02 2475  6 30 10 20 20 30 20 30 40 200  15   4   7   7  15   7  15  27  12       113 146.0 0 150.9 157.0 151.0 152.2 162.1
2015 01 03 2475  7 50 30 30 30 30 20 20 10 220  48  15  15  15  15   7   7   4  15       122 149.0 0 151.0 157.2 154.1 152.4 162.4

I have been trying a clever format string to use with Sscanf (like "%4d%3d%3d%5d...") but it won't work with blanks, or if the number is not right-aligned to its slot.

I'm looking a way to read it like in Fortran, where:

  • Mixed field types (integers, floats, strings) are possible.
  • Each column have a fixed size in characters, filling the slot with blanks if necessary, but different columns may have a different size.
  • Numeric values may be preceded by zeros.
  • Values may be missing, in that case, it gives its zero value.
  • Values may be in any position in the slot, not necessarily right-aligned (not the example but it could be possible)

Is there a clever method to read something like this or should I split, trim, check and convert manually every field?

答案1

得分: 2

package main

import "fmt"
import "reflect"
import "strconv"
import "strings"

type scanner struct {
	len   int
	parts []int
}

func (ss *scanner) Scan(s string, args ...interface{}) (n int, err error) {
	if i := len(s); i != ss.len {
		return 0, fmt.Errorf("期望字符串长度为 %d,实际为 %d", ss.len, i)
	}
	if len(args) != len(ss.parts) {
		return 0, fmt.Errorf("期望 %d 个参数,实际为 %d", len(ss.parts), len(args))
	}
	n = 0
	start := 0
	for ; n < len(args); n++ {
		a := args[n]
		l := ss.parts[n]
		if err = scanOne(s[start:start+l], a); err != nil {
			return
		}
		start += l
	}
	return n, nil
}

func newScan(parts ...int) *scanner {
	len := 0
	for _, v := range parts {
		len += v
	}
	return &scanner{len, parts}
}

func scanOne(s string, arg interface{}) (err error) {
	s = strings.TrimSpace(s)
	switch v := arg.(type) {
	case *int:
		if s == "" {
			*v = int(0)
		} else {
			*v, err = strconv.Atoi(s)
		}
	case *int32:
		if s == "" {
			*v = int32(0)
		} else {
			var val int64
			val, err = strconv.ParseInt(s, 10, 32)
			*v = int32(val)
		}
	case *int64:
		if s == "" {
			*v = int64(0)
		} else {
			*v, err = strconv.ParseInt(s, 10, 64)
		}
	case *float32:
		if s == "" {
			*v = float32(0)
		} else {
			var val float64
			val, err = strconv.ParseFloat(s, 32)
			*v = float32(val)
		}
	case *float64:
		if s == "" {
			*v = float64(0)
		} else {
			*v, err = strconv.ParseFloat(s, 64)
		}
	default:
		val := reflect.ValueOf(v)
		err = fmt.Errorf("无法解析类型: " + val.Type().String())
	}
	return
}

func main() {
	s := newScan(2, 4, 2)
	var a int
	var b float32
	var c int32

	s.Scan("12 2.2 1", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)

	s.Scan("1      2", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)

	s.Scan("        ", &a, &b, &c)
	fmt.Printf("%d %f %d\n", a, b, c)
}

输出:

12 2.200000 1
1 0.000000 1
0 0.000000 0

注意,Scan 函数返回解析的参数数量 n 和错误 err。如果值缺失,函数将将其设置为 0。该实现大部分来自 fmt.Scanf。

英文:
package main
import &quot;fmt&quot;
import &quot;reflect&quot;
import &quot;strconv&quot;
import &quot;strings&quot;
type scanner struct {
len   int
parts []int
}
func (ss *scanner) Scan(s string, args ...interface{}) (n int, err error) {
if i := len(s); i != ss.len {
return 0, fmt.Errorf(&quot;exepected string of size %d, actual %d&quot;, ss.len, i)
}
if len(args) != len(ss.parts) {
return 0, fmt.Errorf(&quot;expected %d args, actual %d&quot;, len(ss.parts), len(args))
}
n = 0
start := 0
for ; n &lt; len(args); n++ {
a := args[n]
l := ss.parts[n]
if err = scanOne(s[start:start+l], a); err != nil {
return
}
start += l
}
return n, nil
}
func newScan(parts ...int) *scanner {
len := 0
for _, v := range parts {
len += v
}
return &amp;scanner{len, parts}
}
func scanOne(s string, arg interface{}) (err error) {
s = strings.TrimSpace(s)
switch v := arg.(type) {
case *int:
if s == &quot;&quot; {
*v = int(0)
} else {
*v, err = strconv.Atoi(s)
}
case *int32:
if s == &quot;&quot; {
*v = int32(0)
} else {
var val int64
val, err = strconv.ParseInt(s, 10, 32)
*v = int32(val)
}
case *int64:
if s == &quot;&quot; {
*v = int64(0)
} else {
*v, err = strconv.ParseInt(s, 10, 64)
}
case *float32:
if s == &quot;&quot; {
*v = float32(0)
} else {
var val float64
val, err = strconv.ParseFloat(s, 32)
*v = float32(val)
}
case *float64:
if s == &quot;&quot; {
*v = float64(0)
} else {
*v, err = strconv.ParseFloat(s, 64)
}
default:
val := reflect.ValueOf(v)
err = fmt.Errorf(&quot;can&#39;t scan type: &quot; + val.Type().String())
}
return
}
func main() {
s := newScan(2, 4, 2)
var a int
var b float32
var c int32
s.Scan(&quot;12 2.2 1&quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
s.Scan(&quot;1      2&quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
s.Scan(&quot;        &quot;, &amp;a, &amp;b, &amp;c)
fmt.Printf(&quot;%d %f %d\n&quot;, a, b, c)
}

Output:

12 2.200000 1
1 0.000000 1
0 0.000000 0

Notice that Scan function returns n - number of parsed arguments and err. If value is missing the function will set it to 0. The implementation is mostly taken from fmt.Scanf.

答案2

得分: 0

你可以使用空格作为分隔符来进行CSV编码。类似这样的代码:

import (
    "encoding/csv"
    "os"
)

file, _ := os.Open("/SpaceData/sw20100101.txt")
csvreader := csv.NewReader(file)
csvreader.Comma = ' '
csvreader.FieldsPerRecord = 33
csvreader.TrimLeadingSpace = true
parsedout, _ := csvreader.Read()

这里有一个可工作的示例:https://play.golang.org/p/Tsp72D4vsR

英文:

You can employ csv encoding with delimiter set to blankspace. Something like this

import (
&quot;encoding/csv&quot;
&quot;os&quot;
)
file, _:=os.Open(&quot;/SpaceData/sw20100101.txt&quot;)
csvreader:=csv.NewReader(file)
csvreader.Comma=&#39; &#39;
csvreader.FieldsPerRecord=33
csvreader.TrimLeadingSpace=true
parsedout, _ := csvreader.Read()

here is working example https://play.golang.org/p/Tsp72D4vsR

huangapple
  • 本文由 发表于 2015年1月16日 00:37:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/27968385.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定