Pandas 读取 CSV 文件时使用 ^G 作为分隔符。

huangapple go评论301阅读模式
英文:

Pandas reading CSV with ^G as separator

问题

The CSV文件使用**^G作为分隔符。我正在使用pandas,当前的分隔符是逗号。我有一个新的要求,需要读取^G**分隔的CSV。是否有任何支持的库关联?此外,所有列都包含在引号中。

示例CSV数据

  1. "2198"^G"data"^G"x"
  2. "2199"^G"data2"^G"y"
  3. "2198"^G"data3"^G"z"

根据建议,我尝试了下面的命令

  1. df = pd.read_csv(f, engine="python", sep=r"\^G", header=None, names=columns, quoting=csv.QUOTE_NONE)

我得到了下面的输出

  1. {"col1":"\"2198\"","col2":"\"data\"","col3":"\"x\""}

如何去掉最终输出中的引号和斜杠?

英文:

The CSV file has a delimiter of ^G. I am using pandas, the current separator is a comma. I have a new requirement to read the ^G-separated CSV. Are there any supported libraries associated? Also, all the columns are enclosed in quotes.

Sample CSV data

  1. "2198"^G"data"^G"x"
  2. "2199"^G"data2"^G"y"
  3. "2198"^G"data3"^G"z"

Based on the suggestion I tried below command

  1. df = pd.read_csv(f, engine="python", sep=r"\^G", header=None, names=columns, quoting=csv.QUOTE_NONE)

I get the output below

  1. {"col1":"\"2198\"","col2":"\"data\"","col3":"\"x\"}

How do I remove the quote marks and slashes for the data in the final output?

答案1

得分: 1

Sure, here is the translated code:

  1. 使用 [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) 时使用 `engine='python'` 并转义 `^`,因为它是一个特殊的正则表达式字符:
  2. ```python
  3. df = pd.read_csv(file, sep=r"\^G", engine='python')

编辑:你可以使用 strip 进行转换以移除 "

  1. columns = list('abc')
  2. df = pd.read_csv(file,
  3. engine="python",
  4. sep=r"\^G",
  5. header=None,
  6. names=columns,
  7. converters=dict.fromkeys(columns, lambda x: x.strip('\"')))
  8. print(df)

结果为:

  1. a b c
  2. 0 2198 data x
  3. 1 2199 data2 y
  4. 2 2198 data3 z
  1. <details>
  2. <summary>英文:</summary>
  3. Use [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) with `engine=&#39;python&#39;` and escape `^` because special regex character:
  4. df = pd.read_csv(file, sep=r&quot;\^G&quot;, engine=&#39;python&#39;)
  5. EDIT: You can use converter with `strip` for remove `&quot;`:
  6. columns = list(&#39;abc&#39;)
  7. df = pd.read_csv(file,
  8. engine=&quot;python&quot;,
  9. sep=r&quot;\^G&quot;,
  10. header=None,
  11. names=columns,
  12. converters=dict.fromkeys(columns, lambda x: x.strip(&#39;&quot;&#39;)))
  13. print (df)
  14. a b c
  15. 0 2198 data x
  16. 1 2199 data2 y
  17. 2 2198 data3 z
  18. </details>

huangapple
  • 本文由 发表于 2023年5月10日 13:43:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定