2023年8月9日 03:37:02go评论116阅读模式

英文:

Pandas to Numpy: why is the last line of CSV file column missing?

问题

非常简单的问题。我正在阅读以特定方式组织的CSV文件。文件没有标题，形状是一个矩形；没有缺失或损坏的条目。我使用pandas读取CSV文件并转换为numpy数组。

问题是，当我打印第一列时，最后一个条目丢失了。打印输出在倒数第二个值结束。

import pandas as pd
import numpy as np
filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
    return out
alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)

我期望打印命令打印整个列，但打印结果在倒数第二个值结束。我在Excel中打开了CSV文件，打印命令明显缺少了最后一个值。然而，如果我执行

print(alldata)

我可以在打印的表格中看到column1的预期最后一个值。发生了什么？0:-1应该跨越整个列，对吗？

英文:

Very simple problem. I am reading in CSV files organized in specific way. There's no header and the file shape is a rectangle; there are no missing or corrupt entries. I read in the csv file using pandas and convert to a numpy array.

The problem is, when I print the first column, the last entry is missing. The printed output ends at the second-to-last value.

import pandas as pd
import numpy as np
filenames=glob.glob(r&#39;\my\filepath\*csv&#39;)
def data(filename):
    out = pd.read_csv(r&#39;{}&#39;.format(filenames[0]),sep=&#39;,&#39;,header=None).to_numpy()
    return out
alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)

I expect the print command to print the entire column, but the print ends at the second-to-last value. I have the csv file open in excel and the print command is clearly missing the last value. However, if I do

print(alldata)

I can see the expected last value of column1 in the printed table. What's happening? The 0:-1 should span the entire column, correct?

答案1

得分: 0

伙计，问题是由切片引起的，alldata[0:-1, 0] 选择从第一行（包括）到最后一行（不包括）。试试这个：

import glob
import pandas as pd
filenames = glob.glob(r'\my\filepath\*csv')
def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
    return out
alldata = data(filenames[0])
column1 = alldata[:, 0]  # 选择第一列中的所有行
print(column1)

这样应该可以解决问题。

英文:

Mate, the problem is caused by the slicing, alldata[0:-1, 0] selects from the first row included until the last row (not included). Try this:

filenames = glob.glob(r&#39;\my\filepath\*csv&#39;)
def data(filename):
    out = pd.read_csv(r&#39;{}&#39;.format(filenames[0]), sep=&#39;,&#39;, header=None).to_numpy()
    return out
alldata = data(filenames[0])
column1 = alldata[:, 0]  # Select all rows in the first column
print(column1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas转换为Numpy：为什么CSV文件的最后一行列缺失？

问题

答案1

日程优化问题 – 变种作业车间

How do I compress repeating tkinter code into a loop so it displays rectangles without having to write out each individual rectangle's coordinates?

Pandas设置DataFrame值时搜索嵌套字典以获取数值的最有效方式

将字符串列表转换为（对象）列表在Pandas中如何做？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。