Pandas 读取 CSV 文件时使用 ^G 作为分隔符。

huangapple go评论248阅读模式
英文:

Pandas reading CSV with ^G as separator

问题

The CSV文件使用**^G作为分隔符。我正在使用pandas,当前的分隔符是逗号。我有一个新的要求,需要读取^G**分隔的CSV。是否有任何支持的库关联?此外,所有列都包含在引号中。

示例CSV数据

"2198"^G"data"^G"x"
"2199"^G"data2"^G"y"
"2198"^G"data3"^G"z"

根据建议,我尝试了下面的命令

 df = pd.read_csv(f, engine="python", sep=r"\^G", header=None, names=columns, quoting=csv.QUOTE_NONE)

我得到了下面的输出

{"col1":"\"2198\"","col2":"\"data\"","col3":"\"x\""}

如何去掉最终输出中的引号和斜杠?

英文:

The CSV file has a delimiter of ^G. I am using pandas, the current separator is a comma. I have a new requirement to read the ^G-separated CSV. Are there any supported libraries associated? Also, all the columns are enclosed in quotes.

Sample CSV data

"2198"^G"data"^G"x"
"2199"^G"data2"^G"y"
"2198"^G"data3"^G"z"

Based on the suggestion I tried below command

 df = pd.read_csv(f, engine="python", sep=r"\^G", header=None, names=columns, quoting=csv.QUOTE_NONE)

I get the output below

{"col1":"\"2198\"","col2":"\"data\"","col3":"\"x\"}

How do I remove the quote marks and slashes for the data in the final output?

答案1

得分: 1

Sure, here is the translated code:

使用 [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) 时使用 `engine='python'` 并转义 `^`,因为它是一个特殊的正则表达式字符:

```python
df = pd.read_csv(file, sep=r"\^G", engine='python')

编辑:你可以使用 strip 进行转换以移除 "

columns = list('abc')

df = pd.read_csv(file,
                 engine="python", 
                 sep=r"\^G",
                 header=None, 
                 names=columns, 
                 converters=dict.fromkeys(columns, lambda x: x.strip('\"')))
    
print(df)

结果为:

     a      b  c
0  2198   data  x
1  2199  data2  y
2  2198  data3  z

<details>
<summary>英文:</summary>

Use [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) with `engine=&#39;python&#39;` and escape `^` because special regex character:

    df = pd.read_csv(file, sep=r&quot;\^G&quot;, engine=&#39;python&#39;)
    
EDIT: You can use converter with `strip` for remove `&quot;`:


    columns = list(&#39;abc&#39;)
    
    df = pd.read_csv(file,
                     engine=&quot;python&quot;, 
                     sep=r&quot;\^G&quot;,
                     header=None, 
                     names=columns, 
                     converters=dict.fromkeys(columns, lambda x: x.strip(&#39;&quot;&#39;)))
        
    print (df)
    
          a      b  c
    0  2198   data  x
    1  2199  data2  y
    2  2198  data3  z

</details>



huangapple
  • 本文由 发表于 2023年5月10日 13:43:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定