英文:
Golang file reading only reading last line
问题
所以我使用了一些公开可用的数据,看起来像这样-
这是文件
http://expirebox.com/download/b149b744768fb11aee9c5e26ad409bcc.html
,,,总支出的百分比,,,
功能代码,活动类型,支出,每个学生的美元(ADA),"该地区(ADA 49,497)",所有统一学区,州平均水平
1000-1999,教学,"$249,397,226","$5,039",42%,62%,62%
1000,教学,"$247,472,790","$5,000",42%,48%,49%
1110,特殊教育:分班,"$1,004,074",$20,N/A,N/A,N/A
1120,特殊教育:资源专家指导,"$781,629",$16,N/A,N/A,N/A
1130,特殊教育:常规教室的补充辅助和服务,"$46,747",$1,N/A,N/A,N/A
1180,特殊教育:非公立机构/学校(NPA/S),N/A,N/A,N/A,N/A,N/A
1190,特殊教育:其他专门教学服务,"$91,985",$2,N/A,N/A,N/A
1100-1199,教学-特殊教育,"$1,924,436",$39,0%,14%,13%
"小计,教学",,"$249,397,226","$5,039",42%,62%,62%
2000-2999,教学相关服务,"$132,783,414","$2,683",22%,12%,12%
2100,教学监督和管理,"$89,551,041","$1,809",N/A,N/A,N/A
2110,教学监督,N/A,N/A,N/A,N/A,N/A
2120,教学研究,N/A,N/A,N/A,N/A,N/A
2130,课程开发,"$348,369",$7,N/A,N/A,N/A
2140,内部教学人员发展,"$19,855",$0,N/A,N/A,N/A
2150,特殊项目的教学管理,N/A,N/A,N/A,N/A,N/A
2100-2199,教学监督和管理,"$89,919,265","$1,817",15%,4%,4%
2200,多区域SELPA的行政单位,$0,$0,0%,0%,0%
2420,"教学图书馆、媒体和技术","$8,295,033",$168,1%,1%,1%
2490,其他教学资源,"$538,734",$11,N/A,N/A,N/A
2495,家长参与,"$97,830",$2,N/A,N/A,N/A
2490-2495,其他教学资源,"$636,565",$13,0%,1%,0%
2700,学校行政,"$33,932,551",$686,6%,7%,7%
"小计,教学相关服务",,"$132,783,414","$2,683",22%,12%,12%
3000-3999,学生服务,"$45,325,938",$916,8%,8%,8%
4000-4999,辅助服务,"$2,207,263",$45,0%,1%,1%
5000-5999,社区服务,$0,$0,0%,0%,0%
6000-6999,企业,"$4,264",$0,0%,0%,0%
7000-7999,总行政,"$27,916,858",$564,5%,5%,6%
8000-8999,设施服务,"$55,172,247","$1,115",9%,11%,10%
9000-9999,其他支出,"$81,981,716",N/A,14%,2%,2%
"总支出,所有活动",,"$594,788,926","$12,017",100%,100%,100%
它是一个csv文件。
我尝试了这段代码
file, err := os.Open("expenses.csv")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
和这段代码
content, err := ioutil.ReadFile("expenses.csv")
lines := strings.Split(string(content), "\n")
fmt.Println(lines)
check(err)
dat, err := os.Open("expenses.csv")
check(err)
defer dat.Close()
reader := csv.NewReader(dat)
reader.LazyQuotes = true
reader.FieldsPerRecord = -1
rawCSVData, err := reader.ReadAll()
check(err)
fmt.Println(rawCSVData)
for _, each := range rawCSVData {
fmt.Println(each)
}
其中check是
func check(e error) {
if e != nil {
panic(e)
}
}
在这两种情况下,我得到了这个结果-
"总支出,所有活动",,"$594,788,926","$12,017",100%,100%,100%,1%15%,4%,4%AA,N/A,N/Anified School Districts,Statewide Average
而不是所有的行。
为什么我只读取到最后一行?
英文:
So I took some publicly available data that looks like this -
this is the file
http://expirebox.com/download/b149b744768fb11aee9c5e26ad409bcc.html
,,,% of Total Expenditure,,,
Function Code,Type of Activity,Expenditure,Dollars/Student (ADA),"This District (ADA 49,497)",All Unified School Districts,Statewide Average
1000-1999ÊÊ,INSTRUCTIONÊÊ,"$249,397,226","$5,039",42%,62%,62%
1000,Instruction,"$247,472,790ÊÊ","$5,000",42%,48%,49%
1110,Special Education: Separate Classes,"$1,004,074",$20,N/A,N/A,N/A
1120,Special Education: Resource Specialist Instruction,"$781,629",$16,N/A,N/A,N/A
1130,Special Education: Supplemental Aids & Services in Regular Classrooms,"$46,747",$1,N/A,N/A,N/A
1180,Special Education: Nonpublic Agencies/Schools (NPA/S),N/A,N/A,N/A,N/A,N/A
1190,Special Education: Other Specialized Instructional Services,"$91,985",$2,N/A,N/A,N/A
1100-1199,Instruction - Special Education,"$1,924,436ÊÊ",$39,0%,14%,13%
"Subtotal, INSTRUCTION",,"$249,397,226","$5,039",42%,62%,62%
2000-2999ÊÊ,INSTRUCTION-RELATED SERVICESÊÊ,"$132,783,414","$2,683",22%,12%,12%
2100,Instructional Supervision and Administration,"$89,551,041","$1,809",N/A,N/A,N/A
2110,Instructional Supervision,N/A,N/A,N/A,N/A,N/A
2120,Instructional Research,N/A,N/A,N/A,N/A,N/A
2130,Curriculum Development,"$348,369",$7,N/A,N/A,N/A
2140,In-house Instructional Staff Development,"$19,855",$0,N/A,N/A,N/A
2150,Instructional Administration of Special Projects,N/A,N/A,N/A,N/A,N/A
2100-2199,Instructional Supervision and Administration,"$89,919,265ÊÊ","$1,817",15%,4%,4%
2200,Administrative Unit (AU) of a Multidistrict SELPA,$0,$0,0%,0%,0%
2420,"Instructional Library, Media, and Technology","$8,295,033ÊÊ",$168,1%,1%,1%
2490,Other Instructional Resources,"$538,734",$11,N/A,N/A,N/A
2495,Parent Participation,"$97,830",$2,N/A,N/A,N/A
2490-2495,Other Instructional Resources,"$636,565ÊÊ",$13,0%,1%,0%
2700,School Administration,"$33,932,551ÊÊ",$686,6%,7%,7%
"Subtotal, INSTRUCTION-RELATED SERVICES",,"$132,783,414","$2,683",22%,12%,12%
3000-3999ÊÊ,PUPIL SERVICESÊÊ,"$45,325,938",$916,8%,8%,8%
4000-4999ÊÊ,ANCILLARY SERVICESÊÊ,"$2,207,263",$45,0%,1%,1%
5000-5999ÊÊ,COMMUNITY SERVICESÊÊ,$0,$0,0%,0%,0%
6000-6999ÊÊ,ENTERPRISEÊÊ,"$4,264",$0,0%,0%,0%
7000-7999ÊÊ,GENERAL ADMINISTRATIONÊÊ,"$27,916,858",$564,5%,5%,6%
8000-8999ÊÊ,PLANT SERVICESÊÊ,"$55,172,247","$1,115",9%,11%,10%
9000-9999ÊÊ,OTHER OUTGOÊÊ,"$81,981,716",N/A,14%,2%,2%
"Total Expenditures, All Activities",,"$594,788,926","$12,017",100%,100%,100%
It's in a csv.
I have tried this code
file, err := os.Open("expenses.csv")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
and this
content, err := ioutil.ReadFile("expenses.csv")
lines := strings.Split(string(content), "\n")
fmt.Println(lines)
check(err)
dat, err := os.Open("expenses.csv")
check(err)
defer dat.Close()
reader := csv.NewReader(dat)
reader.LazyQuotes = true
reader.FieldsPerRecord = -1
rawCSVData, err := reader.ReadAll()
check(err)
fmt.Println(rawCSVData)
for _, each := range rawCSVData {
fmt.Println(each)
}
where check is
func check(e error) {
if e != nil {
panic(e)
}
}
In both cases I get this result -
"Total Expenditures, All Activities",,"$594,788,926","$12,017",100%,100%,100%,1%15%,4%,4%AA,N/A,N/Anified School Districts,Statewide Average
Rather than the all the lines.
Why am I only reading the last line?
答案1
得分: 1
基本问题是该文件使用了\r
作为行尾符,并且不是有效的UTF-8编码。这两个问题会给Scanner
带来很多麻烦。
首先,我们可以使用xxd
查看文件的内容:
00000000: 2c2c 2c25 206f 6620 546f 7461 6c20 4578 ,,,% of Total Ex
00000010: 7065 6e64 6974 7572 652c 2c2c 0d46 756e penditure,,,.Fun
如果你仔细看,你会发现行尾符是0d
,即\r
。Scanner
需要的是\r\n
或\n
作为行尾符。
接下来,你可能会遇到问题,因为该文件不是UTF-8编码。其中所有的Ê
实际上是0xCA
,这不是有效的UTF-8编码。我们可以再次在xxd
中看到这一点:
000000b0: 3939 39ca ca2c 494e 5354 5255 4354 494f 999..,INSTRUCTIO
000000c0: 4eca ca2c 2224 3234 392c 3339 372c 3232 N..,"$249,397,22
Go语言可能会将其作为字节传递(并显示为Ê
),这是许多编辑器尝试做的事情,但很可能会引起问题。
如果可能的话,请重新格式化该文件,使用Unix或Windows的行尾符,并使用UTF-8编码。
英文:
The basic problem is that this file has \r
line endings. It also isn't valid UTF-8. Together, those are going to cause Scanner
a lot of trouble.
First, we can see exactly what's in the file using xxd
00000000: 2c2c 2c25 206f 6620 546f 7461 6c20 4578 ,,,% of Total Ex
00000010: 7065 6e64 6974 7572 652c 2c2c 0d46 756e penditure,,,.Fun
If you look, you'll see the line ending is 0d
, which is \r
. Scanner
needs it to be either \r\n
or \n
.
Next, you may run into trouble because it isn't UTF-8. All those Ê
in there are really 0xCA
, which is not a valid UTF-8 encoding. We can see that in xxd
again:
000000b0: 3939 39ca ca2c 494e 5354 5255 4354 494f 999..,INSTRUCTIO
000000c0: 4eca ca2c 2224 3234 392c 3339 372c 3232 N..,"$249,397,22
Go will probably just ship it along as bytes (and get Ê
), which is what a lot of editors try to do, but it's likely to cause trouble.
If possible, reformat this file to use either Unix or Windows line endings in UTF-8.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论