2023年5月10日 13:43:43go评论301阅读模式

英文:

Pandas reading CSV with ^G as separator

问题

The CSV文件使用**^G作为分隔符。我正在使用pandas，当前的分隔符是逗号。我有一个新的要求，需要读取^G**分隔的CSV。是否有任何支持的库关联？此外，所有列都包含在引号中。

示例CSV数据

"2198"^G"data"^G"x"
"2199"^G"data2"^G"y"
"2198"^G"data3"^G"z"

根据建议，我尝试了下面的命令

 df = pd.read_csv(f, engine="python", sep=r"\^G", header=None, names=columns, quoting=csv.QUOTE_NONE)

我得到了下面的输出

{"col1":"\"2198\"","col2":"\"data\"","col3":"\"x\""}

如何去掉最终输出中的引号和斜杠？

英文:

The CSV file has a delimiter of ^G. I am using pandas, the current separator is a comma. I have a new requirement to read the ^G-separated CSV. Are there any supported libraries associated? Also, all the columns are enclosed in quotes.

Sample CSV data

&quot;2198&quot;^G&quot;data&quot;^G&quot;x&quot;
&quot;2199&quot;^G&quot;data2&quot;^G&quot;y&quot;
&quot;2198&quot;^G&quot;data3&quot;^G&quot;z&quot;

Based on the suggestion I tried below command

 df = pd.read_csv(f, engine=&quot;python&quot;, sep=r&quot;\^G&quot;, header=None, names=columns, quoting=csv.QUOTE_NONE)

I get the output below

{&quot;col1&quot;:&quot;\&quot;2198\&quot;&quot;,&quot;col2&quot;:&quot;\&quot;data\&quot;&quot;,&quot;col3&quot;:&quot;\&quot;x\&quot;}

How do I remove the quote marks and slashes for the data in the final output?

答案1

得分: 1

Sure, here is the translated code:

使用 [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) 时，使用 `engine=&#39;python&#39;` 并转义 `^`，因为它是一个特殊的正则表达式字符：
```python
df = pd.read_csv(file, sep=r"\^G", engine='python')

编辑：你可以使用 strip 进行转换以移除 "：

columns = list('abc')
df = pd.read_csv(file,
                 engine="python", 
                 sep=r"\^G",
                 header=None, 
                 names=columns, 
                 converters=dict.fromkeys(columns, lambda x: x.strip('\"')))
    
print(df)

结果为：

     a      b  c
0  2198   data  x
1  2199  data2  y
2  2198  data3  z


<details>
<summary>英文:</summary>
Use [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) with `engine=&#39;python&#39;` and escape `^` because special regex character:
    df = pd.read_csv(file, sep=r&quot;\^G&quot;, engine=&#39;python&#39;)
    
EDIT: You can use converter with `strip` for remove `&quot;`:
    columns = list(&#39;abc&#39;)
    
    df = pd.read_csv(file,
                     engine=&quot;python&quot;, 
                     sep=r&quot;\^G&quot;,
                     header=None, 
                     names=columns, 
                     converters=dict.fromkeys(columns, lambda x: x.strip(&#39;&quot;&#39;)))
        
    print (df)
    
          a      b  c
    0  2198   data  x
    1  2199  data2  y
    2  2198  data3  z
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 读取 CSV 文件时使用 ^G 作为分隔符。

问题

答案1

只保留R中每个组中的最后一个重复项。

Why when split a string into a list of substrings, without removing the separators, parts of this original string are lost in the splitting process?

为什么我的 WebSocket 协程在以下代码中未被调用？

想要通过使用Apache Beam在管道中过滤事件来动态命名和创建表格？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。