英文:
ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead
问题
我想使用索引、X和y变量数据创建一个新的数据框。
df_idx1 = [[3],
 [4],
 [5],
 [6],
 [7],
 [8]]
X1 = [[[10],
  [20],
  [30]],
 [[20],
  [30],
  [40]],
 [[30],
  [40],
  [50]],
 [[40],
  [50],
  [60]],
 [[50],
  [60],
  [70]],
 [[60],
  [70],
  [80]]]
y1 = [[[40]],
 [[50]],
 [[60]],
 [[70]],
 [[80]],
 [[90]]]
print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1", df_idx1)
print("X1", X1)
print("y1", y1)
exdf1 = pd.DataFrame(data={"X": np.array(X1), "y": np.array(y1)}, index=df_idx1)
输出结果:
exdf1=
              X           y
3   [[10],[20],[30]]   [[40]]
4   [[20],[30],[40]]   [[50]]
5   [[30],[40],[50]]   [[60]]
6   [[40],[50],[60]]   [[70]]
7   [[50],[60],[70]]   [[80]]
8   [[60],[70],[80]]   [[90]]
英文:
I wanted to create a new dataframe using index, X, y variable data.
df_idx1 = [[3]
 [4]
 [5]
 [6]
 [7]
 [8]]
X1 = [[[10]
  [20]
  [30]]
 [[20]
  [30]
  [40]]
 [[30]
  [40]
  [50]]
 [[40]
  [50]
  [60]]
 [[50]
  [60]
  [70]]
 [[60]
  [70]
  [80]]]
y1 = [[[40]]
 [[50]]
 [[60]]
 [[70]]
 [[80]]
 [[90]]]
print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1",df_idx1)
print("X1",X1)
print("y1",y1)
exdf1 = pd.DataFrame(data={"X":np.array(X1),"y":np.array(y1)},index=df_idx1)
present output:
Length of index, X, Y:  6 6 6
ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead
Expected output:
exdf1=
             X                y
3    [[10],[20],[30]]       [[40]]
4    [[20],[30],[40]]       [[50]]
5     ....
6
7
8    [[60],[70],[80]]       [[90]]
答案1
得分: 3
尝试这样做:
idx = np.array(df_idx1).reshape(-1)
df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
df_combined = pd.concat([x, y], axis=1)
df_combined.set_index(idx, inplace=True)
结果将如下所示:
图片
(也许您可能想要重置列)
注意:
数据必须首先以正确的列表格式转换为nd.array
英文:
The question is not explicit and clear here, but I think you just want the error to be handled, which is making the data 1-dimensional and becoming able to make them as a DataFrame.
try this:
idx = np.array(df_idx1).reshape(-1)
df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
df_combined = pd.concat([x,y], axis=1)
df_combined.set_index(idx, inplace=True)
The result would be like this
picture
(perhaps you might want to reset the columns)
Note:
the data must be in the right format of list to make them as an nd.array first
答案2
得分: 3
以下是您提供的代码的翻译:
使用像您的这样的数组:
In [90]: idx = np.arange(3, 9).reshape(6, 1)
In [91]: X = np.arange(10, 28).reshape(6, 3, 1); Y = 10 * np.arange(4, 10).reshape(6, 1, 1)
使用 idx 制作一个DataFrame会产生错误:
In [92]: df = pd.DataFrame(index=idx, columns=['X', 'Y'])
Ravel的idx:
In [93]: df = pd.DataFrame(index=idx.ravel(), columns=['X', 'Y'])
现在分配这两个系列:
In [95]: df['X'] = list(X)
In [96]: df['Y'] = list(Y)
数组提取自DataFrame可能有助于更好地理解实际存储的内容:
In [111]: df.to_numpy()
提取 'X' 列的数组:
In [112]: df['X'].to_numpy()
请注意,这些翻译中包含了代码的关键部分,不包括问题或其他内容。
英文:
With arrays like yours:
In [90]: idx=np.arange(3,9).reshape(6,1)    
In [91]: X = np.arange(10,28).reshape(6,3,1); Y = 10*np.arange(4,10).reshape(6,1,1)
Making a frame with idx produces your error:
In [92]: df=pd.DataFrame( index=idx, columns=['X','Y'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_7567/2068536625.py in <module>
----> 1 df=pd.DataFrame( index=idx, columns=['X','Y'])
~/.local/lib/python3.10/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    662         elif isinstance(data, dict):
    663             # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664             mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
    665         elif isinstance(data, ma.MaskedArray):
    666             import numpy.ma.mrecords as mrecords
~/.local/lib/python3.10/site-packages/pandas/core/internals/construction.py in dict_to_mgr(data, index, columns, dtype, typ, copy)
    448             index = _extract_index(arrays[~missing])
    449         else:
--> 450             index = ensure_index(index)
    451 
    452         # no obvious "empty" int column
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
   7331             return Index._with_infer(index_like, copy=copy, tupleize_cols=False)
   7332     else:
-> 7333         return Index._with_infer(index_like, copy=copy)
   7334 
   7335 
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in _with_infer(cls, *args, **kwargs)
    714         with warnings.catch_warnings():
    715             warnings.filterwarnings("ignore", ".*the Index constructor", FutureWarning)
--> 716             result = cls(*args, **kwargs)
    717 
    718         if result.dtype == _dtype_obj and not result._is_multi:
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
    538 
    539             klass = cls._dtype_to_subclass(arr.dtype)
--> 540             arr = klass._ensure_array(arr, dtype, copy)
    541             disallow_kwargs(kwargs)
    542             return klass._simple_new(arr, name)
~/.local/lib/python3.10/site-packages/pandas/core/indexes/numeric.py in _ensure_array(cls, data, dtype, copy)
    174         if subarr.ndim > 1:
    175             # GH#13601, GH#20285, GH#27125
--> 176             raise ValueError("Index data must be 1-dimensional")
    177 
    178         subarr = np.asarray(subarr)
ValueError: Index data must be 1-dimensional
Ravel of idx:
In [93]: df=pd.DataFrame( index=idx.ravel(), columns=['X','Y'])
In [94]: df
Out[94]: 
     X    Y
3  NaN  NaN
4  NaN  NaN
5  NaN  NaN
6  NaN  NaN
7  NaN  NaN
8  NaN  NaN
Now assign the 2 series:
In [95]: df['X']=list(X)    
In [96]: df['Y']=list(Y)
In [97]: df
Out[97]: 
                    X       Y
3  [[10], [11], [12]]  [[40]]
4  [[13], [14], [15]]  [[50]]
5  [[16], [17], [18]]  [[60]]
6  [[19], [20], [21]]  [[70]]
7  [[22], [23], [24]]  [[80]]
8  [[25], [26], [27]]  [[90]]
I tried various things using data=... parameter, but kept getting various errors, mainly a conflict between the implied columns of X and Y and the desired two.  And the list() was also needed - each Series is object dtype, with 6 separate arrays.
edit
The array extracted from the dataframe may help you understand better what is actually being stored.
In [111]: df.to_numpy()
Out[111]: 
array([[array([[10],
               [11],
               [12]]), array([[40]])],
       [array([[13],
               [14],
               [15]]), array([[50]])],
       [array([[16],
               [17],
               [18]]), array([[60]])],
       [array([[19],
               [20],
               [21]]), array([[70]])],
       [array([[22],
               [23],
               [24]]), array([[80]])],
       [array([[25],
               [26],
               [27]]), array([[90]])]], dtype=object)
In [112]: df['X'].to_numpy()
Out[112]: 
array([array([[10],
              [11],
              [12]]), array([[13],
                             [14],
                             [15]]), array([[16],
                                            [17],
                                            [18]]), array([[19],
                                                           [20],
                                                           [21]]),
       array([[22],
              [23],
              [24]]), array([[25],
                             [26],
                             [27]])], dtype=object)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论