Pandas主要版本中DataFrame操作的更改?

huangapple go评论54阅读模式
英文:

Pandas change in df operation in major version?

问题

我有一段代码,在Pandas v1.4.2中运行良好,在2.0.3中会报错。

我有一个名为order_lines的DataFrame,其中包含订单行,每行由唯一的EAN码(GTIN)标识,以及具有该GTIN的瓶数和箱子中可以装的瓶数。

     lineitemid |     gtin      | items_per_box | box_qty | qty
    ------------+---------------+---------------+---------+------
              1 | 8002793000504 |             6 |     100 |  600
              2 | 8002793000597 |             6 |     200 | 1200

还有一个名为stamp_lines的简单邮票清单:GTIN - 唯一的邮票[,...],格式如下:

        gtin      |             datamatrix_stamp
    --------------+------------------------------------
    8002793000504 | 010406770001455921fx9p;Fq[GS]N1
    8002793000504 | 010406770001455921Cpp;_e![GS]JX
    8002793000504 | 010406770001455921_Y"<anI[GS]9W
    ...

我以前有一个循环,为每个邮票生成了它属于哪个箱子,箱子的范围是多少,以及箱子中的邮票是哪些,格式如下:

         gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
    ---------------+------------+-----------+-----------------+-------------
     8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
     8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
     8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y"<anI[GS]9W
     8002793000504 |          1 | 1-6       |               4 | ...
     8002793000504 |          1 | 1-6       |               5 | ...
     8002793000504 |          1 | 1-6       |               6 | ...
     8002793000504 |          2 | 7-15      |               7 | ...
     8002793000504 |          2 | 7-15      |               8 | ...
     8002793000504 |          2 | 7-15      |               9 | ...
     8002793000504 |          2 | 7-15      |              10 | ...

实现这个的代码是这样的:

    stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # 对每个GTIN分组的邮票进行编号
    for i,row in self.order_lines.iterrows(): # 遍历订单行
        step=int(row['items_per_box'])  # 查看每个订单行中有多少瓶子在一个箱子里
        bottles=int(row['qty']) # 该订单行的总箱子数
        box=1 # 从1开始编号以便人类阅读
        if bottles <= 0 or step <= 0: 
            raise ValueError(f"在第{i-1}行缺少瓶子的数量.")
        for n in range(0, bottles, step):   # 对每个箱子的瓶子进行操作
            if n+step > bottles: # 如果最后一个箱子不满
                step = bottles - n
            # 为相应的stamp_lines列填充box_number、stamp_nr_in_box和box_range字段:
            stamp_lines.loc[(stamp_lines['gtin']==row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
            box+=1

在1.4.2中运行良好。然而,当我升级到Pandas 2时,现在这一行:

    stamp_lines.loc[(stamp_lines['gtin']==row['gtin']) 

会抛出一个错误:

    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我研究了一下,但似乎无法理解问题在哪里。我现在不能用range()和标量一起填写.loc[]吗?

英文:

I have a piece of code that worked fine in Pandas v1.4.2, but throws an error in 2.0.3.

So I have a dataframe order_lines with order lines, each line identified by unique EAN code (GTIN) and having number of bottles with that GTIN, and how many bottles fit in a box.

 lineitemid |     gtin      | items_per_box | box_qty | qty
------------+---------------+---------------+---------+------
          1 | 8002793000504 |             6 |     100 |  600
          2 | 8002793000597 |             6 |     200 | 1200

And a simple list of stamps stamp_lines: GTIN - unique stamp[,...] like this:

    gtin      |             datamatrix_stamp             
--------------+------------------------------------
8002793000504 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 010406770001455921_Y&quot;&lt;anI[GS]9W
...

And I had a loop that generated for each stamp, which box it was in, what was the box range, and which stamp was which in the box, like this:

     gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
---------------+------------+-----------+-----------------+-------------
 8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
 8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
 8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y&quot;&lt;anI[GS]9W
 8002793000504 |          1 | 1-6       |               4 | ...
 8002793000504 |          1 | 1-6       |               5 | ...
 8002793000504 |          1 | 1-6       |               6 | ...
 8002793000504 |          2 | 7-15      |               7 | ...
 8002793000504 |          2 | 7-15      |               8 | ...
 8002793000504 |          2 | 7-15      |               9 | ...
 8002793000504 |          2 | 7-15      |              10 | ...

The code to do it was like this:

	stamp_lines[&#39;gtin_num&#39;]=stamp_lines.groupby([&#39;gtin&#39;]).cumcount() # list and number all stamps grouped by the GTIN 
	for i,row in self.order_lines.iterrows(): # go through order lines
		step=int(row[&#39;items_per_box&#39;])  # for each order line see how many bottles in box
		bottles=int(row[&#39;qty&#39;]) # total number of boxes for that order line 
		box=1 # start numbering with 1 for human readouts
		if bottles &lt;=0 or step &lt;=0: 
			raise ValueError(f&quot;Missing quantity of bottles in line {i-1}.&quot;)
		for n in range(0,bottles, step):   # do this for each box of bottles
			if n+step &gt; bottles: # if last box is not full
				step=bottles-n
			# fill in box_number,stamp_nr_in_box and box_range fields for 
			# appropriate stamp_lines column with respective numbers and ranges:
			stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
						&amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
							[&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)
			box+=1

This worked fine in 1.4.2.

As I upgraded to Pandas 2, however, the line stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] )

now throws out an error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

I looked into this, but can't seem to understand, what seems to be the problem? Can I not now use range() and scalars together to fill out .loc[] ?

答案1

得分: 0

结果表明,我需要首先将这个东西转换成数据框。

所以,不是这样的:

    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = 
                                   box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)

而是这样的:

    fill_df = pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series(['{}-{}'.format(n+1, n+step)] * step))
    
    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = fill_df
英文:

Turns out, I needed to make the thing into dataframe first.

So, instead of this:

stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
                        &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
                            [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = 
                               box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)

I had to do:

fill_df=pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series([&#39;{}-{}&#39;.format(n+1,n+step) ] * step))

stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
                        &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
                            [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = fill_df

huangapple
  • 本文由 发表于 2023年7月13日 21:38:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680032.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定