英文:
Pandas can not construct index
问题
我有一个Excel文件,需要在给定的时间间隔和角度下提取数据。使用SQL很容易,但现在我需要使用pandas来完成。我无法确定是否可以附加实际文件,但以下是CSV文件的列名:
time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;
实际上,在末尾有数十列没有名称,其中包含我需要提取的数据。对于这个问题,切片应该有效。
然而,我需要根据时间间隔(列'time')和角度(列'corr_src')来分开它们。
当我运行:
df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')
我得到:
pandas.errors.ParserError: 无法构造索引。请求使用1个列进行解析,但还有1088个要解析的列。
之前的尝试,受到在这里找到的解决方案的驱动:
如何检查Pandas中是否存在列 表明'time'列不存在。
英文:
I have an Excel file from which I need to extract data at given intervals of time and angles. Easy taskwith SQL but I need to do it with pandas now. I can't figure ou if I can attach the actual file, but here are the names of the columns from the CSV file :
time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;
There are actually dozens of columns without a name at the end, in which lie the data I need to extract. Slicing should work for this.
However, I need to separate them according to intervals of time (column 'time' ) and angles (column 'corr_src').
When I run :
df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')
I get :
pandas.errors.ParserError: Could not construct index. Requested to use 1 number of columns, but 1088 left to parse.
Previous attempts, driven by solutions found here :
How to check if a column exists in Pandas indicate that the 'time' column does not exist.
答案1
得分: 0
更新
似乎真正的问题是读取一个使用;
作为字段分隔符的CSV文件。世界上有一半的人使用,
作为小数分隔符,这意味着另一个字符,通常是;
,被用作字段分隔符。
df=pd.read_csv(path,index_col='time',sep=';',decimal=',')
问题不清楚。没有理由创建原始数据框的副本,而且reset_index()
不会创建一个新的索引。这个警告本身在Pandas 2.0.0中可能不相关。
似乎真正的问题是如何读取一个以逗号分隔值的文本文件,使用time
列作为索引。这可以通过在read_csv中指定索引列来完成。
df=pd.read_csv(path,index_col='time')
CSV不是一种Excel格式,它是一个包含由逗号分隔的值的文本格式。Excel文件(xlsx)是包含XML文件的ZIP包。您不能使用read_excel
读取文本文件,也不能使用read_csv
读取Excel文件。
索引是系列,而不是行。当没有指定索引时,会使用行号。
要检查一个值是否存在于Python容器中,使用in
。这包括Series或索引以及df.columns
:
if 'time' in df.columns :
...
要将索引更改为不同的索引,请使用set_index:
df.set_index(['interface','time'])
reset_index做的是相反的操作 - 它将数据框恢复到原始索引。
英文:
Update
It seems the real problem is reading a CSV file that uses ;
as the field separator. Half the world uses ,
as the decimal separator which means another character, typically ;
, is used as the field separator
df=pd.read_csv(path,index_col='time',sep=';',decimal=',')
The question is unclear. There's no reason to create a copy of the original dataframe and reset_index()
won't create a new index. The warning itself may not be relevant in Pandas 2.0.0
It seems that the real question is how to read a text file with comma separated values, using the time
column as index. This can be done by specifying the index column in read_csv
df=pd.read_csv(path,index_col='time')
CSV isn't an Excel format, it's a text format containing Values Separated by Commas. Excel files (xlsx) are ZIP packages containing XML files. You can't read a text file with read_excel
or an Excel file with read_csv
.
Indexes are series, not rows. When no index is specified, the row number is used instead.
To check whether a value exists in a Python container use in
. That includes Series or indexes and df.columns
:
if 'time' in df.columns :
...
To change the index to a different one use set_index :
df.set_index(['interface','time']
reset_index does the opposite - it reverts the dataframe to the original index.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论