英文:
Acess first line via pandas.read_csv, when header is set to 1
问题
考虑一个CSV文件的情况,其中第一行是单位,第二行是标头。我想将标头作为标头导入,以便方便地使用pd.read_csv
的parse_dates
函数。所以我需要将header
设置为1。不幸的是,pandas似乎会删除第一行。有没有一种方法可以同时使用header
和parse_dates
函数,以保留包含单位的第一行,而不需要读取.csv文件两次?
代码示例:
import pandas as pd
dict = {"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], "s": ["B",3,2,1], "W": ["C",2,3,1]}
csv = pd.DataFrame(dict).to_csv("test.csv")
df = pd.read_csv("test.csv", header=1, parse_dates=["A"])
英文:
Consider the case of a csv file with first row as units and second row as headers. I want to import the header as header, to conviniently use the parse_dates function of pd.read_csv. So I need to set the header equal to 1. Sadly, pandas seems to drop the first row. Is there a way to use both functions - header, as well as parse_dates - to keep the first row with units and without reading the .csv twice?
Code Example:
import pandas as pd
dict = {"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"], "s": ["B",3,2,1], "W": ["C",2,3,1]}
csv = pd.DataFrame(dict).to_csv("test.csv")
df = pd.read_csv("test.csv", header = 1, parse_dates = ["A"])
答案1
得分: 1
我们可以通过传递一个包含标题行位置的行号列表来将多行读取为多级标题。
下面是一个示例模型:
import pandas as pd
from io import StringIO
some_file = StringIO() # 仅用于避免磁盘上的混乱
dict = {
"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"],
"s": ["B",3,2,1],
"W": ["C",2,3,1]
}
# 这里我使用 index=False 删除索引,否则我会传递 index_col=1 给 pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)
some_file.seek(0) # 让我们从文件的开头读取
# 使用 header=[0,1] 我们将前两行读取为多标题
# 要解析日期,我们也可以使用 parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[('-','A')])
print(df, df.dtypes, sep='\n\n')
输出:
- s W
A B C
0 2022-01-27 12:15:32.005 3 2
1 2022-01-27 12:15:33.005 2 3
2 2022-01-27 12:15:34.005 1 1
- A datetime64[ns]
s B int64
W C int64
dtype: object
英文:
We can read several rows as a multilevel headers by passing a list of row numbers, where the header lines are located.
Here's a model:
import pandas as pd
from io import StringIO
some_file = StringIO() # just to avoid messing on a disk
dict = {
"-": ["A","2022-01-27 12:15:32.005","2022-01-27 12:15:33.005","2022-01-27 12:15:34.005"],
"s": ["B",3,2,1],
"W": ["C",2,3,1]
}
# here I drop index with index=False,
# otherwise I'd pass index_col=1 to pd.read_csv
pd.DataFrame(dict).to_csv(some_file, index=False)
some_file.seek(0) # let's read the file from the start
# with header=[0,1] we read the first two rows as a multiheader
# to parse dates we could use also parse_dates=[0]
df = pd.read_csv(some_file, header=[0,1], parse_dates=[('-','A')])
print(df, df.dtypes, sep='\n\n')
Output:
- s W
A B C
0 2022-01-27 12:15:32.005 3 2
1 2022-01-27 12:15:33.005 2 3
2 2022-01-27 12:15:34.005 1 1
- A datetime64[ns]
s B int64
W C int64
dtype: object
答案2
得分: 1
df = pd.read_csv("test.csv", skiprows=[1], names=["A", "B", "C"], parse_dates=['A'])
输出:
A B C
0 - s W
1 2022-01-27 12:15:32.005 3 2
2 2022-01-27 12:15:33.005 2 3
3 2022-01-27 12:15:34.005 1 1
英文:
i am not sure, but i think this code could be helpful for you.
df = pd.read_csv("test.csv",skiprows=[1],names =["A","B","C"],parse_dates=['A'])
output:
A B C
0 - s W
1 2022-01-27 12:15:32.005 3 2
2 2022-01-27 12:15:33.005 2 3
3 2022-01-27 12:15:34.005 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论