通过pandas.read_csv,当header设置为1时,访问第一行。

huangapple go评论69阅读模式
英文:

Acess first line via pandas.read_csv, when header is set to 1

问题

考虑一个CSV文件的情况,其中第一行是单位,第二行是标头。我想将标头作为标头导入,以便方便地使用pd.read_csvparse_dates函数。所以我需要将header设置为1。不幸的是,pandas似乎会删除第一行。有没有一种方法可以同时使用headerparse_dates函数,以保留包含单位的第一行,而不需要读取.csv文件两次?

代码示例

import pandas as pd
dict = {"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], "s": ["B",3,2,1], "W": ["C",2,3,1]}
csv = pd.DataFrame(dict).to_csv("test.csv")
df = pd.read_csv("test.csv", header=1, parse_dates=["A"])
英文:

Consider the case of a csv file with first row as units and second row as headers. I want to import the header as header, to conviniently use the parse_dates function of pd.read_csv. So I need to set the header equal to 1. Sadly, pandas seems to drop the first row. Is there a way to use both functions - header, as well as parse_dates - to keep the first row with units and without reading the .csv twice?
Code Example:

import pandas as pd
dict = {"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], "s": ["B",3,2,1], "W": ["C",2,3,1]}
csv = pd.DataFrame(dict).to_csv("test.csv")
df = pd.read_csv("test.csv", header = 1, parse_dates = ["A"])

答案1

得分: 1

我们可以通过传递一个包含标题行位置的行号列表来将多行读取为多级标题。

下面是一个示例模型:

import pandas as pd
from io import StringIO

some_file = StringIO()    # 仅用于避免磁盘上的混乱

dict = {
    "-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], 
    "s": ["B",3,2,1], 
    "W": ["C",2,3,1]
}

# 这里我使用 index=False 删除索引,否则我会传递 index_col=1 给 pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)

some_file.seek(0)    # 让我们从文件的开头读取

# 使用 header=[0,1] 我们将前两行读取为多标题
# 要解析日期,我们也可以使用 parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[('-','A')])
print(df, df.dtypes, sep='\n\n')

输出:

                        -  s  W
                        A  B  C
0 2022-01-27 12:15:32.005  3  2
1 2022-01-27 12:15:33.005  2  3
2 2022-01-27 12:15:34.005  1  1

-  A    datetime64[ns]
s  B             int64
W  C             int64
dtype: object
英文:

We can read several rows as a multilevel headers by passing a list of row numbers, where the header lines are located.

Here's a model:

import pandas as pd
from io import StringIO

some_file = StringIO()    # just to avoid messing on a disk

dict = {
    "-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], 
    "s": ["B",3,2,1], 
    "W": ["C",2,3,1]
}

# here I drop index with index=False, 
# otherwise I'd pass index_col=1 to pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)

some_file.seek(0)    # let's read the file from the start

# with header=[0,1] we read the first two rows as a multiheader
# to parse dates we could use also parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[('-','A')])
print(df, df.dtypes, sep='\n\n')

Output:

                        -  s  W
                        A  B  C
0 2022-01-27 12:15:32.005  3  2
1 2022-01-27 12:15:33.005  2  3
2 2022-01-27 12:15:34.005  1  1

-  A    datetime64[ns]
s  B             int64
W  C             int64
dtype: object

答案2

得分: 1

df = pd.read_csv("test.csv", skiprows=[1], names=["A", "B", "C"], parse_dates=['A'])

输出:

                         A  B  C
0                        -  s  W
1  2022-01-27 12:15:32.005  3  2
2  2022-01-27 12:15:33.005  2  3
3  2022-01-27 12:15:34.005  1  1
英文:

i am not sure, but i think this code could be helpful for you.

df = pd.read_csv("test.csv",skiprows=[1],names =["A","B","C"],parse_dates=['A'])

output:

                         A  B  C
0                        -  s  W
1  2022-01-27 12:15:32.005  3  2
2  2022-01-27 12:15:33.005  2  3
3  2022-01-27 12:15:34.005  1  1

huangapple
  • 本文由 发表于 2023年7月31日 23:01:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76804883.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定