Pandas主要版本中DataFrame操作的更改?

huangapple go评论92阅读模式
英文:

Pandas change in df operation in major version?

问题

我有一段代码,在Pandas v1.4.2中运行良好,在2.0.3中会报错。

我有一个名为order_lines的DataFrame,其中包含订单行,每行由唯一的EAN码(GTIN)标识,以及具有该GTIN的瓶数和箱子中可以装的瓶数。

  1. lineitemid | gtin | items_per_box | box_qty | qty
  2. ------------+---------------+---------------+---------+------
  3. 1 | 8002793000504 | 6 | 100 | 600
  4. 2 | 8002793000597 | 6 | 200 | 1200

还有一个名为stamp_lines的简单邮票清单:GTIN - 唯一的邮票[,...],格式如下:

  1. gtin | datamatrix_stamp
  2. --------------+------------------------------------
  3. 8002793000504 | 010406770001455921fx9p;Fq[GS]N1
  4. 8002793000504 | 010406770001455921Cpp;_e![GS]JX
  5. 8002793000504 | 010406770001455921_Y"<anI[GS]9W
  6. ...

我以前有一个循环,为每个邮票生成了它属于哪个箱子,箱子的范围是多少,以及箱子中的邮票是哪些,格式如下:

  1. gtin | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
  2. ---------------+------------+-----------+-----------------+-------------
  3. 8002793000504 | 1 | 1-6 | 1 | 010406770001455921fx9p;Fq[GS]N1
  4. 8002793000504 | 1 | 1-6 | 2 | 010406770001455921Cpp;_e![GS]JX
  5. 8002793000504 | 1 | 1-6 | 3 | 010406770001455921_Y"<anI[GS]9W
  6. 8002793000504 | 1 | 1-6 | 4 | ...
  7. 8002793000504 | 1 | 1-6 | 5 | ...
  8. 8002793000504 | 1 | 1-6 | 6 | ...
  9. 8002793000504 | 2 | 7-15 | 7 | ...
  10. 8002793000504 | 2 | 7-15 | 8 | ...
  11. 8002793000504 | 2 | 7-15 | 9 | ...
  12. 8002793000504 | 2 | 7-15 | 10 | ...

实现这个的代码是这样的:

  1. stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # 对每个GTIN分组的邮票进行编号
  2. for i,row in self.order_lines.iterrows(): # 遍历订单行
  3. step=int(row['items_per_box']) # 查看每个订单行中有多少瓶子在一个箱子里
  4. bottles=int(row['qty']) # 该订单行的总箱子数
  5. box=1 # 从1开始编号以便人类阅读
  6. if bottles <= 0 or step <= 0:
  7. raise ValueError(f"在第{i-1}行缺少瓶子的数量.")
  8. for n in range(0, bottles, step): # 对每个箱子的瓶子进行操作
  9. if n+step > bottles: # 如果最后一个箱子不满
  10. step = bottles - n
  11. # 为相应的stamp_lines列填充box_number、stamp_nr_in_box和box_range字段:
  12. stamp_lines.loc[(stamp_lines['gtin']==row['gtin'])
  13. & (stamp_lines['gtin_num'].isin(range(n, n+step))),
  14. ['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
  15. box+=1

在1.4.2中运行良好。然而,当我升级到Pandas 2时,现在这一行:

  1. stamp_lines.loc[(stamp_lines['gtin']==row['gtin'])

会抛出一个错误:

  1. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我研究了一下,但似乎无法理解问题在哪里。我现在不能用range()和标量一起填写.loc[]吗?

英文:

I have a piece of code that worked fine in Pandas v1.4.2, but throws an error in 2.0.3.

So I have a dataframe order_lines with order lines, each line identified by unique EAN code (GTIN) and having number of bottles with that GTIN, and how many bottles fit in a box.

  1. lineitemid | gtin | items_per_box | box_qty | qty
  2. ------------+---------------+---------------+---------+------
  3. 1 | 8002793000504 | 6 | 100 | 600
  4. 2 | 8002793000597 | 6 | 200 | 1200

And a simple list of stamps stamp_lines: GTIN - unique stamp[,...] like this:

  1. gtin | datamatrix_stamp
  2. --------------+------------------------------------
  3. 8002793000504 | 010406770001455921fx9p;Fq[GS]N1
  4. 8002793000504 | 010406770001455921Cpp;_e![GS]JX
  5. 8002793000504 | 010406770001455921_Y&quot;&lt;anI[GS]9W
  6. ...

And I had a loop that generated for each stamp, which box it was in, what was the box range, and which stamp was which in the box, like this:

  1. gtin | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
  2. ---------------+------------+-----------+-----------------+-------------
  3. 8002793000504 | 1 | 1-6 | 1 | 010406770001455921fx9p;Fq[GS]N1
  4. 8002793000504 | 1 | 1-6 | 2 | 010406770001455921Cpp;_e![GS]JX
  5. 8002793000504 | 1 | 1-6 | 3 | 010406770001455921_Y&quot;&lt;anI[GS]9W
  6. 8002793000504 | 1 | 1-6 | 4 | ...
  7. 8002793000504 | 1 | 1-6 | 5 | ...
  8. 8002793000504 | 1 | 1-6 | 6 | ...
  9. 8002793000504 | 2 | 7-15 | 7 | ...
  10. 8002793000504 | 2 | 7-15 | 8 | ...
  11. 8002793000504 | 2 | 7-15 | 9 | ...
  12. 8002793000504 | 2 | 7-15 | 10 | ...

The code to do it was like this:

  1. stamp_lines[&#39;gtin_num&#39;]=stamp_lines.groupby([&#39;gtin&#39;]).cumcount() # list and number all stamps grouped by the GTIN
  2. for i,row in self.order_lines.iterrows(): # go through order lines
  3. step=int(row[&#39;items_per_box&#39;]) # for each order line see how many bottles in box
  4. bottles=int(row[&#39;qty&#39;]) # total number of boxes for that order line
  5. box=1 # start numbering with 1 for human readouts
  6. if bottles &lt;=0 or step &lt;=0:
  7. raise ValueError(f&quot;Missing quantity of bottles in line {i-1}.&quot;)
  8. for n in range(0,bottles, step): # do this for each box of bottles
  9. if n+step &gt; bottles: # if last box is not full
  10. step=bottles-n
  11. # fill in box_number,stamp_nr_in_box and box_range fields for
  12. # appropriate stamp_lines column with respective numbers and ranges:
  13. stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] )
  14. &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
  15. [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)
  16. box+=1

This worked fine in 1.4.2.

As I upgraded to Pandas 2, however, the line stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] )

now throws out an error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

I looked into this, but can't seem to understand, what seems to be the problem? Can I not now use range() and scalars together to fill out .loc[] ?

答案1

得分: 0

结果表明,我需要首先将这个东西转换成数据框。

所以,不是这样的:

  1. stamp_lines.loc[(stamp_lines['gtin'] == row['gtin'])
  2. & (stamp_lines['gtin_num'].isin(range(n, n+step))),
  3. ['box_number', 'stamp_nr_in_box', 'box_range']] =
  4. box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)

而是这样的:

  1. fill_df = pd.DataFrame(pd.Series(list(range(n+1, n+step+1))), pd.Series(['{}-{}'.format(n+1, n+step)] * step))
  2. stamp_lines.loc[(stamp_lines['gtin'] == row['gtin'])
  3. & (stamp_lines['gtin_num'].isin(range(n, n+step))),
  4. ['box_number', 'stamp_nr_in_box', 'box_range']] = fill_df
英文:

Turns out, I needed to make the thing into dataframe first.

So, instead of this:

  1. stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] )
  2. &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
  3. [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] =
  4. box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)

I had to do:

  1. fill_df=pd.DataFrame(pd.Series(list(range(n+1, n+step+1))), pd.Series([&#39;{}-{}&#39;.format(n+1,n+step) ] * step))
  2. stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] )
  3. &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
  4. [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = fill_df

huangapple
  • 本文由 发表于 2023年7月13日 21:38:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680032.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定