引号在CSV-DAT转换时被添加到数据中。

huangapple go评论88阅读模式
英文:

Quotation Marks Getting Added To Data Upon CSV-DAT Conversion

问题

我正试图创建一个模块(或两个)来将文件从dat格式转换为csv格式,然后再转回dat格式。我遇到的问题是,在转换过程中,会向每个“单元格”数据中添加大量引号。

我目前正在使用以下代码来执行此操作:

with open(file_dat_new, 'r') as dat_file:
    with open(file_csv_new, 'w', newline='') as csv_file:
        csv_writer = csv.writer(csv_file)
        for row in dat_file:
            row = [value.strip() for value in row.split(',')]
            csv_writer.writerow(row)

以下是输入的第一行示例:

"TOA5","STA332","CR6","10318","CR6.Std.12.02 CR6-WIFI.05.03","CPU:Sta-332_2022-10-03.cr6","3367","FSDATA"

以及我得到的输出:

""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

所以我的问题是:为什么会添加额外的引号,如何在转换过程中移除它们?

英文:

I am attempting to create a module (or two) to convert a file from dat to csv and back again. The issue I am running into is that the conversion adds a number of quotation marks to each "cell" of data.

I am currently using the following code to do this:

with open(file_dat_new, 'r') as dat_file:
    with open(file_csv_new, 'w', newline='') as csv_file:
        csv_writer = csv.writer(csv_file)
        for row in dat_file:
            row = [value.strip() for value in row.split(',')]
            csv_writer.writerow(row)

Here is an example of the first line of input:

"TOA5","STA332","CR6","10318","CR6.Std.12.02 CR6-WIFI.05.03","CPU:Sta-332_2022-10-03.cr6","3367","FSDATA"

and the output I am getting:

"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

So my question is this: Why are the extra quotations being added and how can I remove them upon conversions?

答案1

得分: 1

以下是您要翻译的内容:

当我按原样运行您的程序时,我得到:

"""TOA5""","""STA332""","""CR6""","""10318""","""CR6.Std.12.02 CR6-WIFI.05.03""","""CPU:Sta-332_2022-10-03.cr6""","""3367""","""FSDATA"""

这看起来比您分享的输出要少极端:

"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

当我将DAT文件视为CSV时:

with open("input.dat", newline="") as f:
    reader = csv.reader(f)
    rows = list(reader)

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

然后我得到:

TOA5,STA332,CR6,10318,CR6.Std.12.02 CR6-WIFI.05.03,CPU:Sta-332_2022-10-03.cr6,3367,FSDATA

您的示例DAT文件是带引号的CSV文件。通常,外部引号用于保护字段数据中的逗号或字段数据中的其他双引号。某些程序即使不需要也会写入双引号(就像您的示例数据一样)。

当您尝试自己解析DAT文件时,按逗号拆分时保留了引号,这些引号在传递给csv.writer时被引用。

对于我来说,如果输入看起来像CSV,我会将其视为CSV并使用csv.reader。

如果我将您的程序的输出重新作为输入发送回去,那么我会得到您分享的更极端的引用

"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

引用将双引号作为数据,例如:

['"Foo, Bar"', 'Baz']

转换为此CSV:

"""Foo, Bar""",Baz

一组双引号标记字段为“引用”,然后每个双引号作为数据(")都会变成""。

因此,"TOA5" 变成了 """TOA5"""(外部有1组双引号,然后6个双引号作为数据都会翻倍)。

英文:

When I run your program as-is I get:

"""TOA5""","""STA332""","""CR6""","""10318""","""CR6.Std.12.02 CR6-WIFI.05.03""","""CPU:Sta-332_2022-10-03.cr6""","""3367""","""FSDATA"""

Which looks less extreme than the output you shared:

"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

When I treat the DAT file as CSV:

with open("input.dat", newline="") as f:
    reader = csv.reader(f)
    rows = list(reader)

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

then I get:

TOA5,STA332,CR6,10318,CR6.Std.12.02 CR6-WIFI.05.03,CPU:Sta-332_2022-10-03.cr6,3367,FSDATA

Your sample DAT file is a CSV with quoted fields. Usually the outside quotes are there to protect a comma in the field data, or another double quote in the field data. Some programs will write the double quotes even if they aren't needed (like your sample data).

When you tried to parse the DAT file yourself, splitting on the comma, you left the quotes, which got quoted when you passed them to csv.writer.

For me, if the input looks remotely like CSV, I treat it as CSV and use csv.reader.

If I send the output of your program back in as the input, then I get the more extreme quoting you shared:

"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""

Quoting turns double-quotes-as-data, like:

['"Foo, Bar"', 'Baz']

into this CSV:

"""Foo, Bar""",Baz

A set of double quotes marks the field as being quoted, then each double-quote-as-data (") becomes "".

So, "TOA5" becomes """TOA5""" (1 set of double quotes on the outside, then each of the 2 double-quotes-as-data gets doubled). Run that through again and we get """""""TOA5""""""" (1 set of double quotes on the outside, then each of the six double-quotes-as-data gets doubled).

huangapple
  • 本文由 发表于 2023年7月28日 01:16:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76782092.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定