2023年7月17日 15:13:04go评论70阅读模式

英文:

Pandas can not construct index

问题

我有一个Excel文件，需要在给定的时间间隔和角度下提取数据。使用SQL很容易，但现在我需要使用pandas来完成。我无法确定是否可以附加实际文件，但以下是CSV文件的列名：

time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;

实际上，在末尾有数十列没有名称，其中包含我需要提取的数据。对于这个问题，切片应该有效。

然而，我需要根据时间间隔（列'time'）和角度（列'corr_src'）来分开它们。

当我运行：

df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')

我得到：

pandas.errors.ParserError: 无法构造索引。请求使用1个列进行解析，但还有1088个要解析的列。

之前的尝试，受到在这里找到的解决方案的驱动：

如何检查Pandas中是否存在列表明'time'列不存在。

英文:

I have an Excel file from which I need to extract data at given intervals of time and angles. Easy taskwith SQL but I need to do it with pandas now. I can't figure ou if I can attach the actual file, but here are the names of the columns from the CSV file :

time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;

There are actually dozens of columns without a name at the end, in which lie the data I need to extract. Slicing should work for this.

However, I need to separate them according to intervals of time (column 'time' ) and angles (column 'corr_src').

When I run :

df = pd.read_csv(r&#39;path\EM_Cell_A-Sample1.csv&#39;, index_col=&#39;time&#39;)

I get :

pandas.errors.ParserError: Could not construct index. Requested to use 1 number of columns, but 1088 left to parse.

Previous attempts, driven by solutions found here :
How to check if a column exists in Pandas indicate that the 'time' column does not exist.

答案1

得分: 0

更新

似乎真正的问题是读取一个使用;作为字段分隔符的CSV文件。世界上有一半的人使用,作为小数分隔符，这意味着另一个字符，通常是;，被用作字段分隔符。

df=pd.read_csv(path,index_col='time',sep=';',decimal=',')

问题不清楚。没有理由创建原始数据框的副本，而且reset_index()不会创建一个新的索引。这个警告本身在Pandas 2.0.0中可能不相关。

似乎真正的问题是如何读取一个以逗号分隔值的文本文件，使用time列作为索引。这可以通过在read_csv中指定索引列来完成。

df=pd.read_csv(path,index_col='time')

CSV不是一种Excel格式，它是一个包含由逗号分隔的值的文本格式。Excel文件（xlsx）是包含XML文件的ZIP包。您不能使用read_excel读取文本文件，也不能使用read_csv读取Excel文件。

索引是系列，而不是行。当没有指定索引时，会使用行号。

要检查一个值是否存在于Python容器中，使用in。这包括Series或索引以及df.columns：

if 'time' in df.columns :
...

要将索引更改为不同的索引，请使用set_index：

df.set_index(['interface','time'])

reset_index做的是相反的操作 - 它将数据框恢复到原始索引。

英文:

Update

It seems the real problem is reading a CSV file that uses ; as the field separator. Half the world uses , as the decimal separator which means another character, typically ;, is used as the field separator

df=pd.read_csv(path,index_col=&#39;time&#39;,sep=&#39;;&#39;,decimal=&#39;,&#39;)

The question is unclear. There's no reason to create a copy of the original dataframe and reset_index() won't create a new index. The warning itself may not be relevant in Pandas 2.0.0

It seems that the real question is how to read a text file with comma separated values, using the time column as index. This can be done by specifying the index column in read_csv

df=pd.read_csv(path,index_col=&#39;time&#39;)

CSV isn't an Excel format, it's a text format containing Values Separated by Commas. Excel files (xlsx) are ZIP packages containing XML files. You can't read a text file with read_excel or an Excel file with read_csv.

Indexes are series, not rows. When no index is specified, the row number is used instead.

To check whether a value exists in a Python container use in. That includes Series or indexes and df.columns :

if &#39;time&#39; in df.columns :
...

To change the index to a different one use set_index :

df.set_index([&#39;interface&#39;,&#39;time&#39;]

reset_index does the opposite - it reverts the dataframe to the original index.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 无法构建索引。

问题

答案1

如何在Google表格API中使用Python进行批量更新，将一列值输入到特定列中。

在同一个循环的不同迭代之间，局部变量会被重用还是重新分配？

压缩pandas DataFrame中的数据，通过移除NaN值并向左移动数值以减少列数。

Pyinstaller：.exe文件在IDE中的运行方式不同。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论