英文:
Pandas change in df operation in major version?
问题
我有一段代码,在Pandas v1.4.2中运行良好,在2.0.3中会报错。
我有一个名为order_lines的DataFrame,其中包含订单行,每行由唯一的EAN码(GTIN)标识,以及具有该GTIN的瓶数和箱子中可以装的瓶数。
     lineitemid |     gtin      | items_per_box | box_qty | qty
    ------------+---------------+---------------+---------+------
              1 | 8002793000504 |             6 |     100 |  600
              2 | 8002793000597 |             6 |     200 | 1200
还有一个名为stamp_lines的简单邮票清单:GTIN - 唯一的邮票[,...],格式如下:
        gtin      |             datamatrix_stamp
    --------------+------------------------------------
    8002793000504 | 010406770001455921fx9p;Fq[GS]N1
    8002793000504 | 010406770001455921Cpp;_e![GS]JX
    8002793000504 | 010406770001455921_Y"<anI[GS]9W
    ...
我以前有一个循环,为每个邮票生成了它属于哪个箱子,箱子的范围是多少,以及箱子中的邮票是哪些,格式如下:
         gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
    ---------------+------------+-----------+-----------------+-------------
     8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
     8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
     8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y"<anI[GS]9W
     8002793000504 |          1 | 1-6       |               4 | ...
     8002793000504 |          1 | 1-6       |               5 | ...
     8002793000504 |          1 | 1-6       |               6 | ...
     8002793000504 |          2 | 7-15      |               7 | ...
     8002793000504 |          2 | 7-15      |               8 | ...
     8002793000504 |          2 | 7-15      |               9 | ...
     8002793000504 |          2 | 7-15      |              10 | ...
实现这个的代码是这样的:
    stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # 对每个GTIN分组的邮票进行编号
    for i,row in self.order_lines.iterrows(): # 遍历订单行
        step=int(row['items_per_box'])  # 查看每个订单行中有多少瓶子在一个箱子里
        bottles=int(row['qty']) # 该订单行的总箱子数
        box=1 # 从1开始编号以便人类阅读
        if bottles <= 0 or step <= 0: 
            raise ValueError(f"在第{i-1}行缺少瓶子的数量.")
        for n in range(0, bottles, step):   # 对每个箱子的瓶子进行操作
            if n+step > bottles: # 如果最后一个箱子不满
                step = bottles - n
            # 为相应的stamp_lines列填充box_number、stamp_nr_in_box和box_range字段:
            stamp_lines.loc[(stamp_lines['gtin']==row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
            box+=1
在1.4.2中运行良好。然而,当我升级到Pandas 2时,现在这一行:
    stamp_lines.loc[(stamp_lines['gtin']==row['gtin']) 
会抛出一个错误:
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
我研究了一下,但似乎无法理解问题在哪里。我现在不能用range()和标量一起填写.loc[]吗?
英文:
I have a piece of code that worked fine in Pandas v1.4.2, but throws an error in 2.0.3.
So I have a dataframe order_lines with order lines, each line identified by unique EAN code (GTIN) and having number of bottles with that GTIN, and how many bottles fit in a box.
 lineitemid |     gtin      | items_per_box | box_qty | qty
------------+---------------+---------------+---------+------
          1 | 8002793000504 |             6 |     100 |  600
          2 | 8002793000597 |             6 |     200 | 1200
And a simple list of stamps stamp_lines: GTIN - unique stamp[,...] like this:
    gtin      |             datamatrix_stamp             
--------------+------------------------------------
8002793000504 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 010406770001455921_Y"<anI[GS]9W
...
And I had a loop that generated for each stamp, which box it was in, what was the box range, and which stamp was which in the box, like this:
     gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
---------------+------------+-----------+-----------------+-------------
 8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
 8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
 8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y"<anI[GS]9W
 8002793000504 |          1 | 1-6       |               4 | ...
 8002793000504 |          1 | 1-6       |               5 | ...
 8002793000504 |          1 | 1-6       |               6 | ...
 8002793000504 |          2 | 7-15      |               7 | ...
 8002793000504 |          2 | 7-15      |               8 | ...
 8002793000504 |          2 | 7-15      |               9 | ...
 8002793000504 |          2 | 7-15      |              10 | ...
The code to do it was like this:
	stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # list and number all stamps grouped by the GTIN 
	for i,row in self.order_lines.iterrows(): # go through order lines
		step=int(row['items_per_box'])  # for each order line see how many bottles in box
		bottles=int(row['qty']) # total number of boxes for that order line 
		box=1 # start numbering with 1 for human readouts
		if bottles <=0 or step <=0: 
			raise ValueError(f"Missing quantity of bottles in line {i-1}.")
		for n in range(0,bottles, step):   # do this for each box of bottles
			if n+step > bottles: # if last box is not full
				step=bottles-n
			# fill in box_number,stamp_nr_in_box and box_range fields for 
			# appropriate stamp_lines column with respective numbers and ranges:
			stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] ) 
						& (stamp_lines['gtin_num'].isin(range(n,n+step))),
							['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1,n+step)
			box+=1
This worked fine in 1.4.2.
As I upgraded to Pandas 2, however, the line stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )
now throws out an error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
I looked into this, but can't seem to understand, what seems to be the problem? Can I not now use range() and scalars together to fill out .loc[] ?
答案1
得分: 0
结果表明,我需要首先将这个东西转换成数据框。
所以,不是这样的:
    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = 
                                   box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
而是这样的:
    fill_df = pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series(['{}-{}'.format(n+1, n+step)] * step))
    
    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = fill_df
英文:
Turns out, I needed to make the thing into dataframe first.
So, instead of this:
stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] ) 
                        & (stamp_lines['gtin_num'].isin(range(n,n+step))),
                            ['box_number', 'stamp_nr_in_box','box_range']] = 
                               box, range(n+1, n+step+1), '{}-{}'.format(n+1,n+step)
I had to do:
fill_df=pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series(['{}-{}'.format(n+1,n+step) ] * step))
stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] ) 
                        & (stamp_lines['gtin_num'].isin(range(n,n+step))),
                            ['box_number', 'stamp_nr_in_box','box_range']] = fill_df
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论