英文:
Pandas change in df operation in major version?
问题
我有一段代码,在Pandas v1.4.2中运行良好,在2.0.3中会报错。
我有一个名为order_lines的DataFrame,其中包含订单行,每行由唯一的EAN码(GTIN)标识,以及具有该GTIN的瓶数和箱子中可以装的瓶数。
lineitemid | gtin | items_per_box | box_qty | qty
------------+---------------+---------------+---------+------
1 | 8002793000504 | 6 | 100 | 600
2 | 8002793000597 | 6 | 200 | 1200
还有一个名为stamp_lines的简单邮票清单:GTIN - 唯一的邮票[,...],格式如下:
gtin | datamatrix_stamp
--------------+------------------------------------
8002793000504 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 010406770001455921_Y"<anI[GS]9W
...
我以前有一个循环,为每个邮票生成了它属于哪个箱子,箱子的范围是多少,以及箱子中的邮票是哪些,格式如下:
gtin | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
---------------+------------+-----------+-----------------+-------------
8002793000504 | 1 | 1-6 | 1 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 1 | 1-6 | 2 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 1 | 1-6 | 3 | 010406770001455921_Y"<anI[GS]9W
8002793000504 | 1 | 1-6 | 4 | ...
8002793000504 | 1 | 1-6 | 5 | ...
8002793000504 | 1 | 1-6 | 6 | ...
8002793000504 | 2 | 7-15 | 7 | ...
8002793000504 | 2 | 7-15 | 8 | ...
8002793000504 | 2 | 7-15 | 9 | ...
8002793000504 | 2 | 7-15 | 10 | ...
实现这个的代码是这样的:
stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # 对每个GTIN分组的邮票进行编号
for i,row in self.order_lines.iterrows(): # 遍历订单行
step=int(row['items_per_box']) # 查看每个订单行中有多少瓶子在一个箱子里
bottles=int(row['qty']) # 该订单行的总箱子数
box=1 # 从1开始编号以便人类阅读
if bottles <= 0 or step <= 0:
raise ValueError(f"在第{i-1}行缺少瓶子的数量.")
for n in range(0, bottles, step): # 对每个箱子的瓶子进行操作
if n+step > bottles: # 如果最后一个箱子不满
step = bottles - n
# 为相应的stamp_lines列填充box_number、stamp_nr_in_box和box_range字段:
stamp_lines.loc[(stamp_lines['gtin']==row['gtin'])
& (stamp_lines['gtin_num'].isin(range(n, n+step))),
['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
box+=1
在1.4.2中运行良好。然而,当我升级到Pandas 2时,现在这一行:
stamp_lines.loc[(stamp_lines['gtin']==row['gtin'])
会抛出一个错误:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
我研究了一下,但似乎无法理解问题在哪里。我现在不能用range()和标量一起填写.loc[]吗?
英文:
I have a piece of code that worked fine in Pandas v1.4.2, but throws an error in 2.0.3.
So I have a dataframe order_lines with order lines, each line identified by unique EAN code (GTIN) and having number of bottles with that GTIN, and how many bottles fit in a box.
lineitemid | gtin | items_per_box | box_qty | qty
------------+---------------+---------------+---------+------
1 | 8002793000504 | 6 | 100 | 600
2 | 8002793000597 | 6 | 200 | 1200
And a simple list of stamps stamp_lines: GTIN - unique stamp[,...] like this:
gtin | datamatrix_stamp
--------------+------------------------------------
8002793000504 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 010406770001455921_Y"<anI[GS]9W
...
And I had a loop that generated for each stamp, which box it was in, what was the box range, and which stamp was which in the box, like this:
gtin | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
---------------+------------+-----------+-----------------+-------------
8002793000504 | 1 | 1-6 | 1 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 1 | 1-6 | 2 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 1 | 1-6 | 3 | 010406770001455921_Y"<anI[GS]9W
8002793000504 | 1 | 1-6 | 4 | ...
8002793000504 | 1 | 1-6 | 5 | ...
8002793000504 | 1 | 1-6 | 6 | ...
8002793000504 | 2 | 7-15 | 7 | ...
8002793000504 | 2 | 7-15 | 8 | ...
8002793000504 | 2 | 7-15 | 9 | ...
8002793000504 | 2 | 7-15 | 10 | ...
The code to do it was like this:
stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # list and number all stamps grouped by the GTIN
for i,row in self.order_lines.iterrows(): # go through order lines
step=int(row['items_per_box']) # for each order line see how many bottles in box
bottles=int(row['qty']) # total number of boxes for that order line
box=1 # start numbering with 1 for human readouts
if bottles <=0 or step <=0:
raise ValueError(f"Missing quantity of bottles in line {i-1}.")
for n in range(0,bottles, step): # do this for each box of bottles
if n+step > bottles: # if last box is not full
step=bottles-n
# fill in box_number,stamp_nr_in_box and box_range fields for
# appropriate stamp_lines column with respective numbers and ranges:
stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )
& (stamp_lines['gtin_num'].isin(range(n,n+step))),
['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1,n+step)
box+=1
This worked fine in 1.4.2.
As I upgraded to Pandas 2, however, the line stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )
now throws out an error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
I looked into this, but can't seem to understand, what seems to be the problem? Can I not now use range() and scalars together to fill out .loc[] ?
答案1
得分: 0
结果表明,我需要首先将这个东西转换成数据框。
所以,不是这样的:
stamp_lines.loc[(stamp_lines['gtin'] == row['gtin'])
& (stamp_lines['gtin_num'].isin(range(n, n+step))),
['box_number', 'stamp_nr_in_box', 'box_range']] =
box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
而是这样的:
fill_df = pd.DataFrame(pd.Series(list(range(n+1, n+step+1))), pd.Series(['{}-{}'.format(n+1, n+step)] * step))
stamp_lines.loc[(stamp_lines['gtin'] == row['gtin'])
& (stamp_lines['gtin_num'].isin(range(n, n+step))),
['box_number', 'stamp_nr_in_box', 'box_range']] = fill_df
英文:
Turns out, I needed to make the thing into dataframe first.
So, instead of this:
stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )
& (stamp_lines['gtin_num'].isin(range(n,n+step))),
['box_number', 'stamp_nr_in_box','box_range']] =
box, range(n+1, n+step+1), '{}-{}'.format(n+1,n+step)
I had to do:
fill_df=pd.DataFrame(pd.Series(list(range(n+1, n+step+1))), pd.Series(['{}-{}'.format(n+1,n+step) ] * step))
stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )
& (stamp_lines['gtin_num'].isin(range(n,n+step))),
['box_number', 'stamp_nr_in_box','box_range']] = fill_df
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论