Golang文件读取只读取最后一行。

huangapple go评论78阅读模式
英文:

Golang file reading only reading last line

问题

所以我使用了一些公开可用的数据,看起来像这样-

这是文件

http://expirebox.com/download/b149b744768fb11aee9c5e26ad409bcc.html

,,,总支出的百分比,,,
功能代码,活动类型,支出,每个学生的美元(ADA),"该地区(ADA 49,497)",所有统一学区,州平均水平
1000-1999,教学,"$249,397,226","$5,039",42%,62%,62%
1000,教学,"$247,472,790","$5,000",42%,48%,49%
1110,特殊教育:分班,"$1,004,074",$20,N/A,N/A,N/A
1120,特殊教育:资源专家指导,"$781,629",$16,N/A,N/A,N/A
1130,特殊教育:常规教室的补充辅助和服务,"$46,747",$1,N/A,N/A,N/A
1180,特殊教育:非公立机构/学校(NPA/S),N/A,N/A,N/A,N/A,N/A
1190,特殊教育:其他专门教学服务,"$91,985",$2,N/A,N/A,N/A
1100-1199,教学-特殊教育,"$1,924,436",$39,0%,14%,13%
"小计,教学",,"$249,397,226","$5,039",42%,62%,62%
2000-2999,教学相关服务,"$132,783,414","$2,683",22%,12%,12%
2100,教学监督和管理,"$89,551,041","$1,809",N/A,N/A,N/A
2110,教学监督,N/A,N/A,N/A,N/A,N/A
2120,教学研究,N/A,N/A,N/A,N/A,N/A
2130,课程开发,"$348,369",$7,N/A,N/A,N/A
2140,内部教学人员发展,"$19,855",$0,N/A,N/A,N/A
2150,特殊项目的教学管理,N/A,N/A,N/A,N/A,N/A
2100-2199,教学监督和管理,"$89,919,265","$1,817",15%,4%,4%
2200,多区域SELPA的行政单位,$0,$0,0%,0%,0%
2420,"教学图书馆、媒体和技术","$8,295,033",$168,1%,1%,1%
2490,其他教学资源,"$538,734",$11,N/A,N/A,N/A
2495,家长参与,"$97,830",$2,N/A,N/A,N/A
2490-2495,其他教学资源,"$636,565",$13,0%,1%,0%
2700,学校行政,"$33,932,551",$686,6%,7%,7%
"小计,教学相关服务",,"$132,783,414","$2,683",22%,12%,12%
3000-3999,学生服务,"$45,325,938",$916,8%,8%,8%
4000-4999,辅助服务,"$2,207,263",$45,0%,1%,1%
5000-5999,社区服务,$0,$0,0%,0%,0%
6000-6999,企业,"$4,264",$0,0%,0%,0%
7000-7999,总行政,"$27,916,858",$564,5%,5%,6%
8000-8999,设施服务,"$55,172,247","$1,115",9%,11%,10%
9000-9999,其他支出,"$81,981,716",N/A,14%,2%,2%
"总支出,所有活动",,"$594,788,926","$12,017",100%,100%,100%

它是一个csv文件。

我尝试了这段代码

file, err := os.Open("expenses.csv")
if err != nil {
    log.Fatal(err)
}
defer file.Close()

scanner := bufio.NewScanner(file)
for scanner.Scan() {
    fmt.Println(scanner.Text())
}

if err := scanner.Err(); err != nil {
    log.Fatal(err)
}

和这段代码

content, err := ioutil.ReadFile("expenses.csv")

lines := strings.Split(string(content), "\n")

fmt.Println(lines)

check(err)

dat, err := os.Open("expenses.csv")
check(err)

defer dat.Close()

reader := csv.NewReader(dat)
reader.LazyQuotes = true

reader.FieldsPerRecord = -1

rawCSVData, err := reader.ReadAll()

check(err)
fmt.Println(rawCSVData)

for _, each := range rawCSVData {
    fmt.Println(each)
}

其中check是

func check(e error) {
    if e != nil {
        panic(e)
    }
}

在这两种情况下,我得到了这个结果-

"总支出,所有活动",,"$594,788,926","$12,017",100%,100%,100%,1%15%,4%,4%AA,N/A,N/Anified School Districts,Statewide Average

而不是所有的行。

为什么我只读取到最后一行?

英文:

So I took some publicly available data that looks like this -

this is the file

http://expirebox.com/download/b149b744768fb11aee9c5e26ad409bcc.html

,,,% of Total Expenditure,,,
Function Code,Type of Activity,Expenditure,Dollars/Student (ADA),"This District (ADA 49,497)",All Unified School Districts,Statewide Average
1000-1999ÊÊ,INSTRUCTIONÊÊ,"$249,397,226","$5,039",42%,62%,62%
1000,Instruction,"$247,472,790ÊÊ","$5,000",42%,48%,49%
1110,Special Education: Separate Classes,"$1,004,074",$20,N/A,N/A,N/A
1120,Special Education: Resource Specialist Instruction,"$781,629",$16,N/A,N/A,N/A
1130,Special Education: Supplemental Aids & Services in Regular Classrooms,"$46,747",$1,N/A,N/A,N/A
1180,Special Education:  Nonpublic Agencies/Schools (NPA/S),N/A,N/A,N/A,N/A,N/A
1190,Special Education:  Other Specialized Instructional Services,"$91,985",$2,N/A,N/A,N/A
1100-1199,Instruction - Special Education,"$1,924,436ÊÊ",$39,0%,14%,13%
"Subtotal, INSTRUCTION",,"$249,397,226","$5,039",42%,62%,62%
2000-2999ÊÊ,INSTRUCTION-RELATED SERVICESÊÊ,"$132,783,414","$2,683",22%,12%,12%
2100,Instructional Supervision and Administration,"$89,551,041","$1,809",N/A,N/A,N/A
2110,Instructional Supervision,N/A,N/A,N/A,N/A,N/A
2120,Instructional Research,N/A,N/A,N/A,N/A,N/A
2130,Curriculum Development,"$348,369",$7,N/A,N/A,N/A
2140,In-house Instructional Staff Development,"$19,855",$0,N/A,N/A,N/A
2150,Instructional Administration of Special Projects,N/A,N/A,N/A,N/A,N/A
2100-2199,Instructional Supervision and Administration,"$89,919,265ÊÊ","$1,817",15%,4%,4%
2200,Administrative Unit (AU) of a Multidistrict SELPA,$0,$0,0%,0%,0%
2420,"Instructional Library, Media, and Technology","$8,295,033ÊÊ",$168,1%,1%,1%
2490,Other Instructional Resources,"$538,734",$11,N/A,N/A,N/A
2495,Parent Participation,"$97,830",$2,N/A,N/A,N/A
2490-2495,Other Instructional Resources,"$636,565ÊÊ",$13,0%,1%,0%
2700,School Administration,"$33,932,551ÊÊ",$686,6%,7%,7%
"Subtotal, INSTRUCTION-RELATED SERVICES",,"$132,783,414","$2,683",22%,12%,12%
3000-3999ÊÊ,PUPIL SERVICESÊÊ,"$45,325,938",$916,8%,8%,8%
4000-4999ÊÊ,ANCILLARY SERVICESÊÊ,"$2,207,263",$45,0%,1%,1%
5000-5999ÊÊ,COMMUNITY SERVICESÊÊ,$0,$0,0%,0%,0%
6000-6999ÊÊ,ENTERPRISEÊÊ,"$4,264",$0,0%,0%,0%
7000-7999ÊÊ,GENERAL ADMINISTRATIONÊÊ,"$27,916,858",$564,5%,5%,6%
8000-8999ÊÊ,PLANT SERVICESÊÊ,"$55,172,247","$1,115",9%,11%,10%
9000-9999ÊÊ,OTHER OUTGOÊÊ,"$81,981,716",N/A,14%,2%,2%
"Total Expenditures, All Activities",,"$594,788,926","$12,017",100%,100%,100%

It's in a csv.

I have tried this code

file, err := os.Open("expenses.csv")
if err != nil {
    log.Fatal(err)
}
defer file.Close()

scanner := bufio.NewScanner(file)
for scanner.Scan() {
    fmt.Println(scanner.Text())
}

if err := scanner.Err(); err != nil {
    log.Fatal(err)
}

and this

content, err := ioutil.ReadFile("expenses.csv")

lines := strings.Split(string(content), "\n")

fmt.Println(lines)

check(err)

dat, err := os.Open("expenses.csv")
check(err)

defer dat.Close()

reader := csv.NewReader(dat)
reader.LazyQuotes = true

reader.FieldsPerRecord = -1

rawCSVData, err := reader.ReadAll()

check(err)
fmt.Println(rawCSVData)

for _, each := range rawCSVData {
	fmt.Println(each)
}

where check is

func check(e error) {
    if e != nil {
        panic(e)
    }
}

In both cases I get this result -

"Total Expenditures, All Activities",,"$594,788,926","$12,017",100%,100%,100%,1%15%,4%,4%AA,N/A,N/Anified School Districts,Statewide Average

Rather than the all the lines.

Why am I only reading the last line?

答案1

得分: 1

基本问题是该文件使用了\r作为行尾符,并且不是有效的UTF-8编码。这两个问题会给Scanner带来很多麻烦。

首先,我们可以使用xxd查看文件的内容:

00000000: 2c2c 2c25 206f 6620 546f 7461 6c20 4578  ,,,% of Total Ex
00000010: 7065 6e64 6974 7572 652c 2c2c 0d46 756e  penditure,,,.Fun

如果你仔细看,你会发现行尾符是0d,即\rScanner需要的是\r\n\n作为行尾符。

接下来,你可能会遇到问题,因为该文件不是UTF-8编码。其中所有的Ê实际上是0xCA,这不是有效的UTF-8编码。我们可以再次在xxd中看到这一点:

000000b0: 3939 39ca ca2c 494e 5354 5255 4354 494f  999..,INSTRUCTIO
000000c0: 4eca ca2c 2224 3234 392c 3339 372c 3232  N..,"$249,397,22

Go语言可能会将其作为字节传递(并显示为Ê),这是许多编辑器尝试做的事情,但很可能会引起问题。

如果可能的话,请重新格式化该文件,使用Unix或Windows的行尾符,并使用UTF-8编码。

英文:

The basic problem is that this file has \r line endings. It also isn't valid UTF-8. Together, those are going to cause Scanner a lot of trouble.

First, we can see exactly what's in the file using xxd

00000000: 2c2c 2c25 206f 6620 546f 7461 6c20 4578  ,,,% of Total Ex
00000010: 7065 6e64 6974 7572 652c 2c2c 0d46 756e  penditure,,,.Fun

If you look, you'll see the line ending is 0d, which is \r. Scanner needs it to be either \r\n or \n.

Next, you may run into trouble because it isn't UTF-8. All those Ê in there are really 0xCA, which is not a valid UTF-8 encoding. We can see that in xxd again:

000000b0: 3939 39ca ca2c 494e 5354 5255 4354 494f  999..,INSTRUCTIO
000000c0: 4eca ca2c 2224 3234 392c 3339 372c 3232  N..,"$249,397,22

Go will probably just ship it along as bytes (and get Ê), which is what a lot of editors try to do, but it's likely to cause trouble.

If possible, reformat this file to use either Unix or Windows line endings in UTF-8.

huangapple
  • 本文由 发表于 2015年6月4日 08:36:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/30633115.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定