Pandas 无法构建索引。

huangapple go评论70阅读模式
英文:

Pandas can not construct index

问题

我有一个Excel文件,需要在给定的时间间隔和角度下提取数据。使用SQL很容易,但现在我需要使用pandas来完成。我无法确定是否可以附加实际文件,但以下是CSV文件的列名:

time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;

实际上,在末尾有数十列没有名称,其中包含我需要提取的数据。对于这个问题,切片应该有效。

然而,我需要根据时间间隔(列'time')和角度(列'corr_src')来分开它们。

当我运行:

df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')

我得到:

pandas.errors.ParserError: 无法构造索引。请求使用1个列进行解析,但还有1088个要解析的列。

之前的尝试,受到在这里找到的解决方案的驱动:

如何检查Pandas中是否存在列 表明'time'列不存在。

英文:

I have an Excel file from which I need to extract data at given intervals of time and angles. Easy taskwith SQL but I need to do it with pandas now. I can't figure ou if I can attach the actual file, but here are the names of the columns from the CSV file :

time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;

There are actually dozens of columns without a name at the end, in which lie the data I need to extract. Slicing should work for this.

However, I need to separate them according to intervals of time (column 'time' ) and angles (column 'corr_src').

When I run :

df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')

I get :

pandas.errors.ParserError: Could not construct index. Requested to use 1 number of columns, but 1088 left to parse.

Previous attempts, driven by solutions found here :
How to check if a column exists in Pandas indicate that the 'time' column does not exist.

答案1

得分: 0

更新

似乎真正的问题是读取一个使用;作为字段分隔符的CSV文件。世界上有一半的人使用,作为小数分隔符,这意味着另一个字符,通常是;,被用作字段分隔符。

df=pd.read_csv(path,index_col='time',sep=';',decimal=',')

问题不清楚。没有理由创建原始数据框的副本,而且reset_index()不会创建一个新的索引。这个警告本身在Pandas 2.0.0中可能不相关

似乎真正的问题是如何读取一个以逗号分隔值的文本文件,使用time列作为索引。这可以通过在read_csv中指定索引列来完成。

df=pd.read_csv(path,index_col='time')

CSV不是一种Excel格式,它是一个包含由逗号分隔的值的文本格式。Excel文件(xlsx)是包含XML文件的ZIP包。您不能使用read_excel读取文本文件,也不能使用read_csv读取Excel文件。

索引是系列,而不是行。当没有指定索引时,会使用行号。

要检查一个值是否存在于Python容器中,使用in。这包括Series或索引以及df.columns

if 'time' in df.columns :
...

要将索引更改为不同的索引,请使用set_index

df.set_index(['interface','time'])

reset_index做的是相反的操作 - 它将数据框恢复到原始索引。

英文:

Update

It seems the real problem is reading a CSV file that uses ; as the field separator. Half the world uses , as the decimal separator which means another character, typically ;, is used as the field separator

df=pd.read_csv(path,index_col='time',sep=';',decimal=',')

The question is unclear. There's no reason to create a copy of the original dataframe and reset_index() won't create a new index. The warning itself may not be relevant in Pandas 2.0.0

It seems that the real question is how to read a text file with comma separated values, using the time column as index. This can be done by specifying the index column in read_csv

df=pd.read_csv(path,index_col='time')

CSV isn't an Excel format, it's a text format containing Values Separated by Commas. Excel files (xlsx) are ZIP packages containing XML files. You can't read a text file with read_excel or an Excel file with read_csv.

Indexes are series, not rows. When no index is specified, the row number is used instead.

To check whether a value exists in a Python container use in. That includes Series or indexes and df.columns :

if 'time' in df.columns :
...

To change the index to a different one use set_index :

df.set_index(['interface','time']

reset_index does the opposite - it reverts the dataframe to the original index.

huangapple
  • 本文由 发表于 2023年7月17日 15:13:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定