ValueError: 数据必须是一维的,而不是形状为 (6, 1) 的 ndarray。

huangapple go评论167阅读模式
英文:

ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead

问题

我想使用索引、X和y变量数据创建一个新的数据框。

df_idx1 = [[3],
 [4],
 [5],
 [6],
 [7],
 [8]]
X1 = [[[10],
  [20],
  [30]],

 [[20],
  [30],
  [40]],

 [[30],
  [40],
  [50]],

 [[40],
  [50],
  [60]],

 [[50],
  [60],
  [70]],

 [[60],
  [70],
  [80]]]
y1 = [[[40]],

 [[50]],

 [[60]],

 [[70]],

 [[80]],

 [[90]]]

print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1", df_idx1)
print("X1", X1)
print("y1", y1)
exdf1 = pd.DataFrame(data={"X": np.array(X1), "y": np.array(y1)}, index=df_idx1)

输出结果:

exdf1=
              X           y
3   [[10],[20],[30]]   [[40]]
4   [[20],[30],[40]]   [[50]]
5   [[30],[40],[50]]   [[60]]
6   [[40],[50],[60]]   [[70]]
7   [[50],[60],[70]]   [[80]]
8   [[60],[70],[80]]   [[90]]
英文:

I wanted to create a new dataframe using index, X, y variable data.

df_idx1 = [[3]
 [4]
 [5]
 [6]
 [7]
 [8]]
X1 = [[[10]
  [20]
  [30]]

 [[20]
  [30]
  [40]]

 [[30]
  [40]
  [50]]

 [[40]
  [50]
  [60]]

 [[50]
  [60]
  [70]]

 [[60]
  [70]
  [80]]]
y1 = [[[40]]

 [[50]]

 [[60]]

 [[70]]

 [[80]]

 [[90]]]

print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1",df_idx1)
print("X1",X1)
print("y1",y1)
exdf1 = pd.DataFrame(data={"X":np.array(X1),"y":np.array(y1)},index=df_idx1)

present output:

Length of index, X, Y:  6 6 6

ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead

Expected output:

exdf1=
             X                y
3    [[10],[20],[30]]       [[40]]
4    [[20],[30],[40]]       [[50]]
5     ....
6
7
8    [[60],[70],[80]]       [[90]]

答案1

得分: 3

尝试这样做:

idx = np.array(df_idx1).reshape(-1)

df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))

df_combined = pd.concat([x, y], axis=1)
df_combined.set_index(idx, inplace=True)

结果将如下所示:
图片
(也许您可能想要重置列)

注意
数据必须首先以正确的列表格式转换为nd.array

英文:

The question is not explicit and clear here, but I think you just want the error to be handled, which is making the data 1-dimensional and becoming able to make them as a DataFrame.

try this:

idx = np.array(df_idx1).reshape(-1)

df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))

df_combined = pd.concat([x,y], axis=1)
df_combined.set_index(idx, inplace=True)

The result would be like this
picture
(perhaps you might want to reset the columns)

Note:
the data must be in the right format of list to make them as an nd.array first

答案2

得分: 3

以下是您提供的代码的翻译:

使用像您的这样的数组:

In [90]: idx = np.arange(3, 9).reshape(6, 1)
In [91]: X = np.arange(10, 28).reshape(6, 3, 1); Y = 10 * np.arange(4, 10).reshape(6, 1, 1)

使用 idx 制作一个DataFrame会产生错误:

In [92]: df = pd.DataFrame(index=idx, columns=['X', 'Y'])

Ravel的idx:

In [93]: df = pd.DataFrame(index=idx.ravel(), columns=['X', 'Y'])

现在分配这两个系列:

In [95]: df['X'] = list(X)
In [96]: df['Y'] = list(Y)

数组提取自DataFrame可能有助于更好地理解实际存储的内容:

In [111]: df.to_numpy()

提取 'X' 列的数组:

In [112]: df['X'].to_numpy()

请注意,这些翻译中包含了代码的关键部分,不包括问题或其他内容。

英文:

With arrays like yours:

In [90]: idx=np.arange(3,9).reshape(6,1)    
In [91]: X = np.arange(10,28).reshape(6,3,1); Y = 10*np.arange(4,10).reshape(6,1,1)

Making a frame with idx produces your error:

In [92]: df=pd.DataFrame( index=idx, columns=['X','Y'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_7567/2068536625.py in <module>
----> 1 df=pd.DataFrame( index=idx, columns=['X','Y'])

~/.local/lib/python3.10/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    662         elif isinstance(data, dict):
    663             # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664             mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
    665         elif isinstance(data, ma.MaskedArray):
    666             import numpy.ma.mrecords as mrecords

~/.local/lib/python3.10/site-packages/pandas/core/internals/construction.py in dict_to_mgr(data, index, columns, dtype, typ, copy)
    448             index = _extract_index(arrays[~missing])
    449         else:
--> 450             index = ensure_index(index)
    451 
    452         # no obvious "empty" int column

~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
   7331             return Index._with_infer(index_like, copy=copy, tupleize_cols=False)
   7332     else:
-> 7333         return Index._with_infer(index_like, copy=copy)
   7334 
   7335 

~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in _with_infer(cls, *args, **kwargs)
    714         with warnings.catch_warnings():
    715             warnings.filterwarnings("ignore", ".*the Index constructor", FutureWarning)
--> 716             result = cls(*args, **kwargs)
    717 
    718         if result.dtype == _dtype_obj and not result._is_multi:

~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
    538 
    539             klass = cls._dtype_to_subclass(arr.dtype)
--> 540             arr = klass._ensure_array(arr, dtype, copy)
    541             disallow_kwargs(kwargs)
    542             return klass._simple_new(arr, name)

~/.local/lib/python3.10/site-packages/pandas/core/indexes/numeric.py in _ensure_array(cls, data, dtype, copy)
    174         if subarr.ndim > 1:
    175             # GH#13601, GH#20285, GH#27125
--> 176             raise ValueError("Index data must be 1-dimensional")
    177 
    178         subarr = np.asarray(subarr)

ValueError: Index data must be 1-dimensional

Ravel of idx:

In [93]: df=pd.DataFrame( index=idx.ravel(), columns=['X','Y'])

In [94]: df
Out[94]: 
     X    Y
3  NaN  NaN
4  NaN  NaN
5  NaN  NaN
6  NaN  NaN
7  NaN  NaN
8  NaN  NaN

Now assign the 2 series:

In [95]: df['X']=list(X)    
In [96]: df['Y']=list(Y)

In [97]: df
Out[97]: 
                    X       Y
3  [[10], [11], [12]]  [[40]]
4  [[13], [14], [15]]  [[50]]
5  [[16], [17], [18]]  [[60]]
6  [[19], [20], [21]]  [[70]]
7  [[22], [23], [24]]  [[80]]
8  [[25], [26], [27]]  [[90]]

I tried various things using data=... parameter, but kept getting various errors, mainly a conflict between the implied columns of X and Y and the desired two. And the list() was also needed - each Series is object dtype, with 6 separate arrays.

edit

The array extracted from the dataframe may help you understand better what is actually being stored.

In [111]: df.to_numpy()
Out[111]: 
array([[array([[10],
               [11],
               [12]]), array([[40]])],
       [array([[13],
               [14],
               [15]]), array([[50]])],
       [array([[16],
               [17],
               [18]]), array([[60]])],
       [array([[19],
               [20],
               [21]]), array([[70]])],
       [array([[22],
               [23],
               [24]]), array([[80]])],
       [array([[25],
               [26],
               [27]]), array([[90]])]], dtype=object)

In [112]: df['X'].to_numpy()
Out[112]: 
array([array([[10],
              [11],
              [12]]), array([[13],
                             [14],
                             [15]]), array([[16],
                                            [17],
                                            [18]]), array([[19],
                                                           [20],
                                                           [21]]),
       array([[22],
              [23],
              [24]]), array([[25],
                             [26],
                             [27]])], dtype=object)

huangapple
  • 本文由 发表于 2023年6月8日 09:23:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76428038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定