请求的数组在将列表转换为NumPy数组后,在1个维度上具有不均匀的形状。

huangapple go评论108阅读模式
英文:

The requested array has an inhomogeneous shape after 1 dimensions when converting list to numpy array

问题

我正在尝试使用名为load_data_new的函数加载训练和测试数据,该函数从topomaps/文件夹读取数据,并从labels/文件夹读取标签。它们都包含.npy文件。

具体来说,topomaps/文件夹包含:

请求的数组在将列表转换为NumPy数组后,在1个维度上具有不均匀的形状。

例如,s01_trial03.npy包含128个拓扑图,而s01_trial12包含2944个拓扑图(即它们的形状可能不同!)

labels/文件夹包含:

请求的数组在将列表转换为NumPy数组后,在1个维度上具有不均匀的形状。

此外,训练数据必须仅包含标签为0的拓扑图(而测试数据可以包含标签为0、1或2的拓扑图)。这是我的代码:

  1. def load_data_new(topomap_folder: str, labels_folder: str, test_size: float = 0.2) -> tuple:
  2. """
  3. Load and pair topomap data and corresponding label data from separate folders
  4. :param topomap_folder: (str) The path to the folder containing topomaps .npy files
  5. :param labels_folder: (str) The path to the folder containing labels .npy files
  6. :param test_size: (float) The proportion of data to be allocated to the testing set (default is 0.2)
  7. :return: (tuple) Two tuples, each containing a topomap ndarray and its corresponding label 1D-array.
  8. Note:
  9. The function assumes that the filenames of the topomaps and labels are in the same order.
  10. It also assumes that there is a one-to-one correspondence between the topomap files and the label files.
  11. If there are inconsistencies between the shapes of the topomap and label files, it will print a warning message.
  12. Example:
  13. topomap_folder = "topomaps"
  14. labels_folder = "labels"
  15. (x_train, y_train), (x_test, y_test) = load_data_new(topomap_folder, labels_folder, test_size=0.2)
  16. """
  17. topomap_files = os.listdir(topomap_folder)
  18. labels_files = os.listdir(labels_folder)
  19. # Sort the files to ensure the order is consistent
  20. topomap_files.sort()
  21. labels_files.sort()
  22. labels = []
  23. topomaps = []
  24. for topomap_file, label_file in zip(topomap_files, labels_files):
  25. if topomap_file.endswith(".npy") and label_file.endswith(".npy"):
  26. topomap_path = os.path.join(topomap_folder, topomap_file)
  27. label_path = os.path.join(labels_folder, label_file)
  28. topomap_data = np.load(topomap_path)
  29. label_data = np.load(label_path)
  30. if topomap_data.shape[0] != label_data.shape[0]:
  31. raise ValueError(f"Warning: Inconsistent shapes for {topomap_file} and {label_file}")
  32. topomaps.append(topomap_data)
  33. labels.append(label_data)
  34. x = np.array(topomaps)
  35. y = np.array(labels)
  36. # Training set only contains images whose label is 0 for anomaly detection
  37. train_indices = np.where(y == 0)[0]
  38. x_train = x[train_indices]
  39. y_train = y[train_indices]
  40. # Split the remaining data into testing sets
  41. remaining_indices = np.where(y != 0)[0]
  42. x_remaining = x[remaining_indices]
  43. y_remaining = y[remaining_indices]
  44. _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)
  45. return (x_train, y_train), (x_test, y_test)
  46. (x_train, y_train), (x_test, y_test) = load_data_new("topomaps", "labels")

但不幸的是,我遇到了这个错误:

  1. Traceback (most recent call last):
  2. File "/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py", line 574, in <module>
  3. (x_train, y_train), (x_test, y_test) = load_data_new("topomaps", "labels")
  4. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  5. File "/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py", line 60, in load_data_new
  6. x = np.array(topomaps)
  7. ^^^^^^^^^^^^^^^^^^
  8. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (851,) + inhomogeneous part.

这表明topomaps列表中的元素具有不同的形状,导致在尝试将其转换为NumPy数组时出现不均匀的数组。这个错误是因为topomaps列表中的各个拓扑图具有不同的形状,而NumPy数组需要具有一致形状的元素。

我该如何修复这个问题?

英文:

I am trying to load training and test data using a function named load_data_new which reads data from topomaps/ folder and labels from labels/ folder. They both contain .npy files.

Specifically topomaps/ folder contains:

请求的数组在将列表转换为NumPy数组后,在1个维度上具有不均匀的形状。

where, for example, s01_trial03.npy contains 128 topomaps while s01_trial12 contains 2944 topomaps (that is, they might differ in shape!)

while labels/ folder contains:

请求的数组在将列表转换为NumPy数组后,在1个维度上具有不均匀的形状。

Moreover training data must contain only topomaps whose label is 0 (while test data can contain topomaps whose label is 0, 1 or 2). This is my code:

  1. def load_data_new(topomap_folder: str, labels_folder: str, test_size: float = 0.2) -&gt; tuple:
  2. &quot;&quot;&quot;
  3. Load and pair topomap data and corresponding label data from separate folders
  4. :param topomap_folder: (str) The path to the folder containing topomaps .npy files
  5. :param labels_folder: (str) The path to the folder containing labels .npy files
  6. :param test_size: (float) The proportion of data to be allocated to the testing set (default is 0.2)
  7. :return: (tuple) Two tuples, each containing a topomap ndarray and its corresponding label 1D-array.
  8. Note:
  9. The function assumes that the filenames of the topomaps and labels are in the same order.
  10. It also assumes that there is a one-to-one correspondence between the topomap files and the label files.
  11. If there are inconsistencies between the shapes of the topomap and label files, it will print a warning message.
  12. Example:
  13. topomap_folder = &quot;topomaps&quot;
  14. labels_folder = &quot;labels&quot;
  15. (x_train, y_train), (x_test, y_test) = load_data_new(topomap_folder, labels_folder, test_size=0.2)
  16. &quot;&quot;&quot;
  17. topomap_files = os.listdir(topomap_folder)
  18. labels_files = os.listdir(labels_folder)
  19. # Sort the files to ensure the order is consistent
  20. topomap_files.sort()
  21. labels_files.sort()
  22. labels = []
  23. topomaps = []
  24. for topomap_file, label_file in zip(topomap_files, labels_files):
  25. if topomap_file.endswith(&quot;.npy&quot;) and label_file.endswith(&quot;.npy&quot;):
  26. topomap_path = os.path.join(topomap_folder, topomap_file)
  27. label_path = os.path.join(labels_folder, label_file)
  28. topomap_data = np.load(topomap_path)
  29. label_data = np.load(label_path)
  30. if topomap_data.shape[0] != label_data.shape[0]:
  31. raise ValueError(f&quot;Warning: Inconsistent shapes for {topomap_file} and {label_file}&quot;)
  32. topomaps.append(topomap_data)
  33. labels.append(label_data)
  34. x = np.array(topomaps)
  35. y = np.array(labels)
  36. # Training set only contains images whose label is 0 for anomaly detection
  37. train_indices = np.where(y == 0)[0]
  38. x_train = x[train_indices]
  39. y_train = y[train_indices]
  40. # Split the remaining data into testing sets
  41. remaining_indices = np.where(y != 0)[0]
  42. x_remaining = x[remaining_indices]
  43. y_remaining = y[remaining_indices]
  44. _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)
  45. return (x_train, y_train), (x_test, y_test)
  46. (x_train, y_train), (x_test, y_test) = load_data_new(&quot;topomaps&quot;, &quot;labels&quot;)

But unfortunately I am getting this error:

  1. Traceback (most recent call last):
  2. File &quot;/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py&quot;, line 574, in &lt;module&gt;
  3. (x_train, y_train), (x_test, y_test) = load_data_new(&quot;topomaps&quot;, &quot;labels&quot;)
  4. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  5. File &quot;/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py&quot;, line 60, in load_data_new
  6. x = np.array(topomaps)
  7. ^^^^^^^^^^^^^^^^^^
  8. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (851,) + inhomogeneous part.

Which indicates that the elements within the topomaps list have different shapes, leading to an inhomogeneous array when trying to convert it to a NumPy array. This error occurs because the individual topomaps in the topomaps list have different shapes, and NumPy arrays require elements of consistent shape.

How may I fix?

答案1

得分: 0

我用以下方式解决了这个问题:

  1. def load_data(topomaps_folder: str, labels_folder: str, test_size=0.2) -> tuple:
  2. x, y = _create_dataset(topomaps_folder, labels_folder)
  3. # 训练集仅包含标签为0的图像,用于异常检测
  4. train_indices = np.where(y == 0)[0]
  5. x_train = x[train_indices]
  6. y_train = y[train_indices]
  7. # 将剩余数据分割为测试集
  8. remaining_indices = np.where(y != 0)[0]
  9. x_remaining = x[remaining_indices]
  10. y_remaining = y[remaining_indices]
  11. _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)
  12. return (x_train, y_train), (x_test, y_test)
  13. def _create_dataset(topomaps_folder, labels_folder):
  14. topomaps_files = os.listdir(topomaps_folder)
  15. labels_files = os.listdir(labels_folder)
  16. topomaps_files.sort()
  17. labels_files.sort()
  18. x = []
  19. y = []
  20. n_files = len(topomaps_files)
  21. for topomaps_file, labels_file in tqdm(zip(topomaps_files, labels_files), total=n_files, desc="加载数据集"):
  22. topomaps_array = np.load(f"{topomaps_folder}/{topomaps_file}")
  23. labels_array = np.load(f"{labels_folder}/{labels_file}")
  24. if topomaps_array.shape[0] != labels_array.shape[0]:
  25. raise Exception("形状必须相等")
  26. for i in range(topomaps_array.shape[0]):
  27. x.append(topomaps_array[i])
  28. y.append(labels_array[i])
  29. x = np.array(x)
  30. y = np.array(y)
  31. return x, y

以上是翻译好的代码部分。

英文:

I simply solved the issue this way:

  1. def load_data(topomaps_folder: str, labels_folder: str, test_size=0.2) -&gt; tuple:
  2. x, y = _create_dataset(topomaps_folder, labels_folder)
  3. # Training set only contains images whose label is 0 for anomaly detection
  4. train_indices = np.where(y == 0)[0]
  5. x_train = x[train_indices]
  6. y_train = y[train_indices]
  7. # Split the remaining data into testing sets
  8. remaining_indices = np.where(y != 0)[0]
  9. x_remaining = x[remaining_indices]
  10. y_remaining = y[remaining_indices]
  11. _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)
  12. return (x_train, y_train), (x_test, y_test)
  13. def _create_dataset(topomaps_folder, labels_folder):
  14. topomaps_files = os.listdir(topomaps_folder)
  15. labels_files = os.listdir(labels_folder)
  16. topomaps_files.sort()
  17. labels_files.sort()
  18. x = []
  19. y = []
  20. n_files = len(topomaps_files)
  21. for topomaps_file, labels_file in tqdm(zip(topomaps_files, labels_files), total=n_files, desc=&quot;Loading data set&quot;):
  22. topomaps_array = np.load(f&quot;{topomaps_folder}/{topomaps_file}&quot;)
  23. labels_array = np.load(f&quot;{labels_folder}/{labels_file}&quot;)
  24. if topomaps_array.shape[0] != labels_array.shape[0]:
  25. raise Exception(&quot;Shapes must be equal&quot;)
  26. for i in range(topomaps_array.shape[0]):
  27. x.append(topomaps_array[i])
  28. y.append(labels_array[i])
  29. x = np.array(x)
  30. y = np.array(y)
  31. return x, y

huangapple
  • 本文由 发表于 2023年7月27日 17:55:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778558.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定