英文:
Pandas to Numpy: why is the last line of CSV file column missing?
问题
非常简单的问题。我正在阅读以特定方式组织的CSV文件。文件没有标题,形状是一个矩形;没有缺失或损坏的条目。我使用pandas读取CSV文件并转换为numpy数组。
问题是,当我打印第一列时,最后一个条目丢失了。打印输出在倒数第二个值结束。
import pandas as pd
import numpy as np
filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
return out
alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)
我期望打印命令打印整个列,但打印结果在倒数第二个值结束。我在Excel中打开了CSV文件,打印命令明显缺少了最后一个值。然而,如果我执行
print(alldata)
我可以在打印的表格中看到column1的预期最后一个值。发生了什么?0:-1应该跨越整个列,对吗?
英文:
Very simple problem. I am reading in CSV files organized in specific way. There's no header and the file shape is a rectangle; there are no missing or corrupt entries. I read in the csv file using pandas and convert to a numpy array.
The problem is, when I print the first column, the last entry is missing. The printed output ends at the second-to-last value.
import pandas as pd
import numpy as np
filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
return out
alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)
I expect the print command to print the entire column, but the print ends at the second-to-last value. I have the csv file open in excel and the print command is clearly missing the last value. However, if I do
print(alldata)
I can see the expected last value of column1 in the printed table. What's happening? The 0:-1 should span the entire column, correct?
答案1
得分: 0
伙计,问题是由切片引起的,alldata[0:-1, 0] 选择从第一行(包括)到最后一行(不包括)。试试这个:
import glob
import pandas as pd
filenames = glob.glob(r'\my\filepath\*csv')
def data(filename):
out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
return out
alldata = data(filenames[0])
column1 = alldata[:, 0] # 选择第一列中的所有行
print(column1)
这样应该可以解决问题。
英文:
Mate, the problem is caused by the slicing, alldata[0:-1, 0] selects from the first row included until the last row (not included). Try this:
filenames = glob.glob(r'\my\filepath\*csv')
def data(filename):
out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
return out
alldata = data(filenames[0])
column1 = alldata[:, 0] # Select all rows in the first column
print(column1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论