英文:
ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead
问题
我想使用索引、X和y变量数据创建一个新的数据框。
df_idx1 = [[3],
[4],
[5],
[6],
[7],
[8]]
X1 = [[[10],
[20],
[30]],
[[20],
[30],
[40]],
[[30],
[40],
[50]],
[[40],
[50],
[60]],
[[50],
[60],
[70]],
[[60],
[70],
[80]]]
y1 = [[[40]],
[[50]],
[[60]],
[[70]],
[[80]],
[[90]]]
print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1", df_idx1)
print("X1", X1)
print("y1", y1)
exdf1 = pd.DataFrame(data={"X": np.array(X1), "y": np.array(y1)}, index=df_idx1)
输出结果:
exdf1=
X y
3 [[10],[20],[30]] [[40]]
4 [[20],[30],[40]] [[50]]
5 [[30],[40],[50]] [[60]]
6 [[40],[50],[60]] [[70]]
7 [[50],[60],[70]] [[80]]
8 [[60],[70],[80]] [[90]]
英文:
I wanted to create a new dataframe using index, X, y variable data.
df_idx1 = [[3]
[4]
[5]
[6]
[7]
[8]]
X1 = [[[10]
[20]
[30]]
[[20]
[30]
[40]]
[[30]
[40]
[50]]
[[40]
[50]
[60]]
[[50]
[60]
[70]]
[[60]
[70]
[80]]]
y1 = [[[40]]
[[50]]
[[60]]
[[70]]
[[80]]
[[90]]]
print("Length index, X, Y: ", len(df_idx1), len(X1), len(y1))
print("df_idx1",df_idx1)
print("X1",X1)
print("y1",y1)
exdf1 = pd.DataFrame(data={"X":np.array(X1),"y":np.array(y1)},index=df_idx1)
present output:
Length of index, X, Y: 6 6 6
ValueError: Data must be 1-dimensional, got ndarray of shape (6, 1) instead
Expected output:
exdf1=
X y
3 [[10],[20],[30]] [[40]]
4 [[20],[30],[40]] [[50]]
5 ....
6
7
8 [[60],[70],[80]] [[90]]
答案1
得分: 3
尝试这样做:
idx = np.array(df_idx1).reshape(-1)
df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
df_combined = pd.concat([x, y], axis=1)
df_combined.set_index(idx, inplace=True)
结果将如下所示:
图片
(也许您可能想要重置列)
注意:
数据必须首先以正确的列表格式转换为nd.array
英文:
The question is not explicit and clear here, but I think you just want the error to be handled, which is making the data 1-dimensional and becoming able to make them as a DataFrame.
try this:
idx = np.array(df_idx1).reshape(-1)
df_X1 = pd.DataFrame(np.array(X1).reshape(6, -1))
df_Y1 = pd.DataFrame(np.array(y1).reshape(-1))
df_combined = pd.concat([x,y], axis=1)
df_combined.set_index(idx, inplace=True)
The result would be like this
picture
(perhaps you might want to reset the columns)
Note:
the data must be in the right format of list to make them as an nd.array
first
答案2
得分: 3
以下是您提供的代码的翻译:
使用像您的这样的数组:
In [90]: idx = np.arange(3, 9).reshape(6, 1)
In [91]: X = np.arange(10, 28).reshape(6, 3, 1); Y = 10 * np.arange(4, 10).reshape(6, 1, 1)
使用 idx
制作一个DataFrame会产生错误:
In [92]: df = pd.DataFrame(index=idx, columns=['X', 'Y'])
Ravel的idx:
In [93]: df = pd.DataFrame(index=idx.ravel(), columns=['X', 'Y'])
现在分配这两个系列:
In [95]: df['X'] = list(X)
In [96]: df['Y'] = list(Y)
数组提取自DataFrame可能有助于更好地理解实际存储的内容:
In [111]: df.to_numpy()
提取 'X' 列的数组:
In [112]: df['X'].to_numpy()
请注意,这些翻译中包含了代码的关键部分,不包括问题或其他内容。
英文:
With arrays like yours:
In [90]: idx=np.arange(3,9).reshape(6,1)
In [91]: X = np.arange(10,28).reshape(6,3,1); Y = 10*np.arange(4,10).reshape(6,1,1)
Making a frame with idx
produces your error:
In [92]: df=pd.DataFrame( index=idx, columns=['X','Y'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_7567/2068536625.py in <module>
----> 1 df=pd.DataFrame( index=idx, columns=['X','Y'])
~/.local/lib/python3.10/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
662 elif isinstance(data, dict):
663 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
665 elif isinstance(data, ma.MaskedArray):
666 import numpy.ma.mrecords as mrecords
~/.local/lib/python3.10/site-packages/pandas/core/internals/construction.py in dict_to_mgr(data, index, columns, dtype, typ, copy)
448 index = _extract_index(arrays[~missing])
449 else:
--> 450 index = ensure_index(index)
451
452 # no obvious "empty" int column
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
7331 return Index._with_infer(index_like, copy=copy, tupleize_cols=False)
7332 else:
-> 7333 return Index._with_infer(index_like, copy=copy)
7334
7335
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in _with_infer(cls, *args, **kwargs)
714 with warnings.catch_warnings():
715 warnings.filterwarnings("ignore", ".*the Index constructor", FutureWarning)
--> 716 result = cls(*args, **kwargs)
717
718 if result.dtype == _dtype_obj and not result._is_multi:
~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)
538
539 klass = cls._dtype_to_subclass(arr.dtype)
--> 540 arr = klass._ensure_array(arr, dtype, copy)
541 disallow_kwargs(kwargs)
542 return klass._simple_new(arr, name)
~/.local/lib/python3.10/site-packages/pandas/core/indexes/numeric.py in _ensure_array(cls, data, dtype, copy)
174 if subarr.ndim > 1:
175 # GH#13601, GH#20285, GH#27125
--> 176 raise ValueError("Index data must be 1-dimensional")
177
178 subarr = np.asarray(subarr)
ValueError: Index data must be 1-dimensional
Ravel of idx:
In [93]: df=pd.DataFrame( index=idx.ravel(), columns=['X','Y'])
In [94]: df
Out[94]:
X Y
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
Now assign the 2 series:
In [95]: df['X']=list(X)
In [96]: df['Y']=list(Y)
In [97]: df
Out[97]:
X Y
3 [[10], [11], [12]] [[40]]
4 [[13], [14], [15]] [[50]]
5 [[16], [17], [18]] [[60]]
6 [[19], [20], [21]] [[70]]
7 [[22], [23], [24]] [[80]]
8 [[25], [26], [27]] [[90]]
I tried various things using data=...
parameter, but kept getting various errors, mainly a conflict between the implied columns of X and Y and the desired two. And the list()
was also needed - each Series is object dtype, with 6 separate arrays.
edit
The array extracted from the dataframe may help you understand better what is actually being stored.
In [111]: df.to_numpy()
Out[111]:
array([[array([[10],
[11],
[12]]), array([[40]])],
[array([[13],
[14],
[15]]), array([[50]])],
[array([[16],
[17],
[18]]), array([[60]])],
[array([[19],
[20],
[21]]), array([[70]])],
[array([[22],
[23],
[24]]), array([[80]])],
[array([[25],
[26],
[27]]), array([[90]])]], dtype=object)
In [112]: df['X'].to_numpy()
Out[112]:
array([array([[10],
[11],
[12]]), array([[13],
[14],
[15]]), array([[16],
[17],
[18]]), array([[19],
[20],
[21]]),
array([[22],
[23],
[24]]), array([[25],
[26],
[27]])], dtype=object)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论