ValueError: 数据必须是一维的,而不是形状为 (6, 1) 的 ndarray。

huangapple go评论210阅读模式
英文:

ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead

问题

我想使用索引、X和y变量数据创建一个新的数据框。

  1. df_idx1 = [[3],
  2. [4],
  3. [5],
  4. [6],
  5. [7],
  6. [8]]
  7. X1 = [[[10],
  8. [20],
  9. [30]],
  10. [[20],
  11. [30],
  12. [40]],
  13. [[30],
  14. [40],
  15. [50]],
  16. [[40],
  17. [50],
  18. [60]],
  19. [[50],
  20. [60],
  21. [70]],
  22. [[60],
  23. [70],
  24. [80]]]
  25. y1 = [[[40]],
  26. [[50]],
  27. [[60]],
  28. [[70]],
  29. [[80]],
  30. [[90]]]
  31. print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
  32. print("df_idx1", df_idx1)
  33. print("X1", X1)
  34. print("y1", y1)
  35. exdf1 = pd.DataFrame(data={"X": np.array(X1), "y": np.array(y1)}, index=df_idx1)

输出结果:

  1. exdf1=
  2. X y
  3. 3 [[10],[20],[30]] [[40]]
  4. 4 [[20],[30],[40]] [[50]]
  5. 5 [[30],[40],[50]] [[60]]
  6. 6 [[40],[50],[60]] [[70]]
  7. 7 [[50],[60],[70]] [[80]]
  8. 8 [[60],[70],[80]] [[90]]
英文:

I wanted to create a new dataframe using index, X, y variable data.

  1. df_idx1 = [[3]
  2. [4]
  3. [5]
  4. [6]
  5. [7]
  6. [8]]
  7. X1 = [[[10]
  8. [20]
  9. [30]]
  10. [[20]
  11. [30]
  12. [40]]
  13. [[30]
  14. [40]
  15. [50]]
  16. [[40]
  17. [50]
  18. [60]]
  19. [[50]
  20. [60]
  21. [70]]
  22. [[60]
  23. [70]
  24. [80]]]
  25. y1 = [[[40]]
  26. [[50]]
  27. [[60]]
  28. [[70]]
  29. [[80]]
  30. [[90]]]
  31. print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
  32. print("df_idx1",df_idx1)
  33. print("X1",X1)
  34. print("y1",y1)
  35. exdf1 = pd.DataFrame(data={"X":np.array(X1),"y":np.array(y1)},index=df_idx1)

present output:

  1. Length of index, X, Y: 6 6 6
  2. ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead

Expected output:

  1. exdf1=
  2. X y
  3. 3 [[10],[20],[30]] [[40]]
  4. 4 [[20],[30],[40]] [[50]]
  5. 5 ....
  6. 6
  7. 7
  8. 8 [[60],[70],[80]] [[90]]

答案1

得分: 3

尝试这样做:

  1. idx = np.array(df_idx1).reshape(-1)
  2. df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
  3. df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
  4. df_combined = pd.concat([x, y], axis=1)
  5. df_combined.set_index(idx, inplace=True)

结果将如下所示:
图片
(也许您可能想要重置列)

注意
数据必须首先以正确的列表格式转换为nd.array

英文:

The question is not explicit and clear here, but I think you just want the error to be handled, which is making the data 1-dimensional and becoming able to make them as a DataFrame.

try this:

  1. idx = np.array(df_idx1).reshape(-1)
  2. df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
  3. df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
  4. df_combined = pd.concat([x,y], axis=1)
  5. df_combined.set_index(idx, inplace=True)

The result would be like this
picture
(perhaps you might want to reset the columns)

Note:
the data must be in the right format of list to make them as an nd.array first

答案2

得分: 3

以下是您提供的代码的翻译:

使用像您的这样的数组:

  1. In [90]: idx = np.arange(3, 9).reshape(6, 1)
  2. In [91]: X = np.arange(10, 28).reshape(6, 3, 1); Y = 10 * np.arange(4, 10).reshape(6, 1, 1)

使用 idx 制作一个DataFrame会产生错误:

  1. In [92]: df = pd.DataFrame(index=idx, columns=['X', 'Y'])

Ravel的idx:

  1. In [93]: df = pd.DataFrame(index=idx.ravel(), columns=['X', 'Y'])

现在分配这两个系列:

  1. In [95]: df['X'] = list(X)
  2. In [96]: df['Y'] = list(Y)

数组提取自DataFrame可能有助于更好地理解实际存储的内容:

  1. In [111]: df.to_numpy()

提取 'X' 列的数组:

  1. In [112]: df['X'].to_numpy()

请注意,这些翻译中包含了代码的关键部分,不包括问题或其他内容。

英文:

With arrays like yours:

  1. In [90]: idx=np.arange(3,9).reshape(6,1)
  2. In [91]: X = np.arange(10,28).reshape(6,3,1); Y = 10*np.arange(4,10).reshape(6,1,1)

Making a frame with idx produces your error:

  1. In [92]: df=pd.DataFrame( index=idx, columns=['X','Y'])
  2. ---------------------------------------------------------------------------
  3. ValueError Traceback (most recent call last)
  4. /tmp/ipykernel_7567/2068536625.py in <module>
  5. ----> 1 df=pd.DataFrame( index=idx, columns=['X','Y'])
  6. ~/.local/lib/python3.10/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
  7. 662 elif isinstance(data, dict):
  8. 663 # GH#38939 de facto copy defaults to False only in non-dict cases
  9. --> 664 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  10. 665 elif isinstance(data, ma.MaskedArray):
  11. 666 import numpy.ma.mrecords as mrecords
  12. ~/.local/lib/python3.10/site-packages/pandas/core/internals/construction.py in dict_to_mgr(data, index, columns, dtype, typ, copy)
  13. 448 index = _extract_index(arrays[~missing])
  14. 449 else:
  15. --> 450 index = ensure_index(index)
  16. 451
  17. 452 # no obvious "empty" int column
  18. ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
  19. 7331 return Index._with_infer(index_like, copy=copy, tupleize_cols=False)
  20. 7332 else:
  21. -> 7333 return Index._with_infer(index_like, copy=copy)
  22. 7334
  23. 7335
  24. ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in _with_infer(cls, *args, **kwargs)
  25. 714 with warnings.catch_warnings():
  26. 715 warnings.filterwarnings("ignore", ".*the Index constructor", FutureWarning)
  27. --> 716 result = cls(*args, **kwargs)
  28. 717
  29. 718 if result.dtype == _dtype_obj and not result._is_multi:
  30. ~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
  31. 538
  32. 539 klass = cls._dtype_to_subclass(arr.dtype)
  33. --> 540 arr = klass._ensure_array(arr, dtype, copy)
  34. 541 disallow_kwargs(kwargs)
  35. 542 return klass._simple_new(arr, name)
  36. ~/.local/lib/python3.10/site-packages/pandas/core/indexes/numeric.py in _ensure_array(cls, data, dtype, copy)
  37. 174 if subarr.ndim > 1:
  38. 175 # GH#13601, GH#20285, GH#27125
  39. --> 176 raise ValueError("Index data must be 1-dimensional")
  40. 177
  41. 178 subarr = np.asarray(subarr)
  42. ValueError: Index data must be 1-dimensional

Ravel of idx:

  1. In [93]: df=pd.DataFrame( index=idx.ravel(), columns=['X','Y'])
  2. In [94]: df
  3. Out[94]:
  4. X Y
  5. 3 NaN NaN
  6. 4 NaN NaN
  7. 5 NaN NaN
  8. 6 NaN NaN
  9. 7 NaN NaN
  10. 8 NaN NaN

Now assign the 2 series:

  1. In [95]: df['X']=list(X)
  2. In [96]: df['Y']=list(Y)
  3. In [97]: df
  4. Out[97]:
  5. X Y
  6. 3 [[10], [11], [12]] [[40]]
  7. 4 [[13], [14], [15]] [[50]]
  8. 5 [[16], [17], [18]] [[60]]
  9. 6 [[19], [20], [21]] [[70]]
  10. 7 [[22], [23], [24]] [[80]]
  11. 8 [[25], [26], [27]] [[90]]

I tried various things using data=... parameter, but kept getting various errors, mainly a conflict between the implied columns of X and Y and the desired two. And the list() was also needed - each Series is object dtype, with 6 separate arrays.

edit

The array extracted from the dataframe may help you understand better what is actually being stored.

  1. In [111]: df.to_numpy()
  2. Out[111]:
  3. array([[array([[10],
  4. [11],
  5. [12]]), array([[40]])],
  6. [array([[13],
  7. [14],
  8. [15]]), array([[50]])],
  9. [array([[16],
  10. [17],
  11. [18]]), array([[60]])],
  12. [array([[19],
  13. [20],
  14. [21]]), array([[70]])],
  15. [array([[22],
  16. [23],
  17. [24]]), array([[80]])],
  18. [array([[25],
  19. [26],
  20. [27]]), array([[90]])]], dtype=object)
  21. In [112]: df['X'].to_numpy()
  22. Out[112]:
  23. array([array([[10],
  24. [11],
  25. [12]]), array([[13],
  26. [14],
  27. [15]]), array([[16],
  28. [17],
  29. [18]]), array([[19],
  30. [20],
  31. [21]]),
  32. array([[22],
  33. [23],
  34. [24]]), array([[25],
  35. [26],
  36. [27]])], dtype=object)

huangapple
  • 本文由 发表于 2023年6月8日 09:23:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76428038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定