2023年7月31日 23:01:45go评论81阅读模式

英文:

Acess first line via pandas.read_csv, when header is set to 1

问题

考虑一个CSV文件的情况，其中第一行是单位，第二行是标头。我想将标头作为标头导入，以便方便地使用pd.read_csv的parse_dates函数。所以我需要将header设置为1。不幸的是，pandas似乎会删除第一行。有没有一种方法可以同时使用header和parse_dates函数，以保留包含单位的第一行，而不需要读取.csv文件两次？

代码示例：

import pandas as pd
dict = {"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], "s": ["B",3,2,1], "W": ["C",2,3,1]}
csv = pd.DataFrame(dict).to_csv("test.csv")
df = pd.read_csv("test.csv", header=1, parse_dates=["A"])

英文:

Consider the case of a csv file with first row as units and second row as headers. I want to import the header as header, to conviniently use the parse_dates function of pd.read_csv. So I need to set the header equal to 1. Sadly, pandas seems to drop the first row. Is there a way to use both functions - header, as well as parse_dates - to keep the first row with units and without reading the .csv twice?
Code Example:

import pandas as pd
dict = {&quot;-&quot;: [&quot;A&quot;,&quot;2022-01-27 12:15:32.005&quot;,&quot;2022-01-27 12:15:33.005&quot;,&quot;2022-01-27 12:15:34.005&quot;], &quot;s&quot;: [&quot;B&quot;,3,2,1], &quot;W&quot;: [&quot;C&quot;,2,3,1]}
csv = pd.DataFrame(dict).to_csv(&quot;test.csv&quot;)
df = pd.read_csv(&quot;test.csv&quot;, header = 1, parse_dates = [&quot;A&quot;])

答案1

得分: 1

我们可以通过传递一个包含标题行位置的行号列表来将多行读取为多级标题。

下面是一个示例模型：

import pandas as pd
from io import StringIO

some_file = StringIO()    # 仅用于避免磁盘上的混乱

dict = {
    "-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], 
    "s": ["B",3,2,1], 
    "W": ["C",2,3,1]
}

# 这里我使用 index=False 删除索引，否则我会传递 index_col=1 给 pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)

some_file.seek(0)    # 让我们从文件的开头读取

# 使用 header=[0,1] 我们将前两行读取为多标题
# 要解析日期，我们也可以使用 parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[('-','A')])
print(df, df.dtypes, sep='\n\n')

输出：

                        -  s  W
                        A  B  C
0 2022-01-27 12:15:32.005  3  2
1 2022-01-27 12:15:33.005  2  3
2 2022-01-27 12:15:34.005  1  1

-  A    datetime64[ns]
s  B             int64
W  C             int64
dtype: object

英文:

We can read several rows as a multilevel headers by passing a list of row numbers, where the header lines are located.

Here's a model:

import pandas as pd
from io import StringIO

some_file = StringIO()    # just to avoid messing on a disk

dict = {
    &quot;-&quot;: [&quot;A&quot;,&quot;2022-01-27 12:15:32.005&quot;,&quot;2022-01-27 12:15:33.005&quot;,&quot;2022-01-27 12:15:34.005&quot;], 
    &quot;s&quot;: [&quot;B&quot;,3,2,1], 
    &quot;W&quot;: [&quot;C&quot;,2,3,1]
}

# here I drop index with index=False, 
# otherwise I&#39;d pass index_col=1 to pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)

some_file.seek(0)    # let&#39;s read the file from the start

# with header=[0,1] we read the first two rows as a multiheader
# to parse dates we could use also parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[(&#39;-&#39;,&#39;A&#39;)])
print(df, df.dtypes, sep=&#39;\n\n&#39;)

Output:

                        -  s  W
                        A  B  C
0 2022-01-27 12:15:32.005  3  2
1 2022-01-27 12:15:33.005  2  3
2 2022-01-27 12:15:34.005  1  1

-  A    datetime64[ns]
s  B             int64
W  C             int64
dtype: object

答案2

得分: 1

df = pd.read_csv("test.csv", skiprows=[1], names=["A", "B", "C"], parse_dates=['A'])

输出:

                         A  B  C
0                        -  s  W
1  2022-01-27 12:15:32.005  3  2
2  2022-01-27 12:15:33.005  2  3
3  2022-01-27 12:15:34.005  1  1

英文:

i am not sure, but i think this code could be helpful for you.

df = pd.read_csv("test.csv",skiprows=[1],names =["A","B","C"],parse_dates=['A'])

output:

                         A  B  C
0                        -  s  W
1  2022-01-27 12:15:32.005  3  2
2  2022-01-27 12:15:33.005  2  3
3  2022-01-27 12:15:34.005  1  1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

通过pandas.read_csv，当header设置为1时，访问第一行。

问题

答案1

答案2

JSON 转换为 Python 数据帧：从另一个 API 映射值

python mysql – SELECT 语句是否需要 commit()？

匹配嵌套列表中的索引和元素

你可以在Kivy的MapView中用画布圆圈替换标准标记。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论