Pandas转换为Numpy:为什么CSV文件的最后一行列缺失?

huangapple go评论110阅读模式
英文:

Pandas to Numpy: why is the last line of CSV file column missing?

问题

非常简单的问题。我正在阅读以特定方式组织的CSV文件。文件没有标题,形状是一个矩形;没有缺失或损坏的条目。我使用pandas读取CSV文件并转换为numpy数组。

问题是,当我打印第一列时,最后一个条目丢失了。打印输出在倒数第二个值结束。

  1. import pandas as pd
  2. import numpy as np
  3. filenames=glob.glob(r'\my\filepath\*csv')
  4. def data(filename):
  5. out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
  6. return out
  7. alldata = data(filenames[0])
  8. column1 = alldata[0:-1,0]
  9. print(column1)

我期望打印命令打印整个列,但打印结果在倒数第二个值结束。我在Excel中打开了CSV文件,打印命令明显缺少了最后一个值。然而,如果我执行

  1. print(alldata)

我可以在打印的表格中看到column1的预期最后一个值。发生了什么?0:-1应该跨越整个列,对吗?

英文:

Very simple problem. I am reading in CSV files organized in specific way. There's no header and the file shape is a rectangle; there are no missing or corrupt entries. I read in the csv file using pandas and convert to a numpy array.

The problem is, when I print the first column, the last entry is missing. The printed output ends at the second-to-last value.

  1. import pandas as pd
  2. import numpy as np
  3. filenames=glob.glob(r'\my\filepath\*csv')
  4. def data(filename):
  5. out = pd.read_csv(r'{}'.format(filenames[0]),sep=',',header=None).to_numpy()
  6. return out
  7. alldata = data(filenames[0])
  8. column1 = alldata[0:-1,0]
  9. print(column1)

I expect the print command to print the entire column, but the print ends at the second-to-last value. I have the csv file open in excel and the print command is clearly missing the last value. However, if I do

  1. print(alldata)

I can see the expected last value of column1 in the printed table. What's happening? The 0:-1 should span the entire column, correct?

答案1

得分: 0

伙计,问题是由切片引起的,alldata[0:-1, 0] 选择从第一行(包括)到最后一行(不包括)。试试这个:

  1. import glob
  2. import pandas as pd
  3. filenames = glob.glob(r'\my\filepath\*csv')
  4. def data(filename):
  5. out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
  6. return out
  7. alldata = data(filenames[0])
  8. column1 = alldata[:, 0] # 选择第一列中的所有行
  9. print(column1)

这样应该可以解决问题。

英文:

Mate, the problem is caused by the slicing, alldata[0:-1, 0] selects from the first row included until the last row (not included). Try this:

  1. filenames = glob.glob(r'\my\filepath\*csv')
  2. def data(filename):
  3. out = pd.read_csv(r'{}'.format(filenames[0]), sep=',', header=None).to_numpy()
  4. return out
  5. alldata = data(filenames[0])
  6. column1 = alldata[:, 0] # Select all rows in the first column
  7. print(column1)

huangapple
  • 本文由 发表于 2023年8月9日 03:37:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862736.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定