Pandas转换为Numpy:为什么CSV文件的最后一行列缺失?

huangapple go评论92阅读模式
英文:

Pandas to Numpy: why is the last line of CSV file column missing?

问题

非常简单的问题。我正在阅读以特定方式组织的CSV文件。文件没有标题,形状是一个矩形;没有缺失或损坏的条目。我使用pandas读取CSV文件并转换为numpy数组。

问题是,当我打印第一列时,最后一个条目丢失了。打印输出在倒数第二个值结束。

import pandas as pd
import numpy as np

filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)

我期望打印命令打印整个列,但打印结果在倒数第二个值结束。我在Excel中打开了CSV文件,打印命令明显缺少了最后一个值。然而,如果我执行

print(alldata)

我可以在打印的表格中看到column1的预期最后一个值。发生了什么?0:-1应该跨越整个列,对吗?

英文:

Very simple problem. I am reading in CSV files organized in specific way. There's no header and the file shape is a rectangle; there are no missing or corrupt entries. I read in the csv file using pandas and convert to a numpy array.

The problem is, when I print the first column, the last entry is missing. The printed output ends at the second-to-last value.

import pandas as pd
import numpy as np

filenames=glob.glob(r'\my\filepath\*csv')
def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[0:-1,0]
print(column1)

I expect the print command to print the entire column, but the print ends at the second-to-last value. I have the csv file open in excel and the print command is clearly missing the last value. However, if I do

print(alldata)

I can see the expected last value of column1 in the printed table. What's happening? The 0:-1 should span the entire column, correct?

答案1

得分: 0

伙计,问题是由切片引起的,alldata[0:-1, 0] 选择从第一行(包括)到最后一行(不包括)。试试这个:

import glob
import pandas as pd

filenames = glob.glob(r'\my\filepath\*csv')

def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[:, 0]  # 选择第一列中的所有行
print(column1)

这样应该可以解决问题。

英文:

Mate, the problem is caused by the slicing, alldata[0:-1, 0] selects from the first row included until the last row (not included). Try this:

filenames = glob.glob(r'\my\filepath\*csv')

def data(filename):
    out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
    return out

alldata = data(filenames[0])
column1 = alldata[:, 0]  # Select all rows in the first column
print(column1)

huangapple
  • 本文由 发表于 2023年8月9日 03:37:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862736.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定