2023年7月13日 21:38:49go评论92阅读模式

英文:

Pandas change in df operation in major version?

问题

我有一段代码，在Pandas v1.4.2中运行良好，在2.0.3中会报错。

我有一个名为order_lines的DataFrame，其中包含订单行，每行由唯一的EAN码（GTIN）标识，以及具有该GTIN的瓶数和箱子中可以装的瓶数。

     lineitemid |     gtin      | items_per_box | box_qty | qty
    ------------+---------------+---------------+---------+------
              1 | 8002793000504 |             6 |     100 |  600
              2 | 8002793000597 |             6 |     200 | 1200

还有一个名为stamp_lines的简单邮票清单：GTIN - 唯一的邮票[,...]，格式如下：

        gtin      |             datamatrix_stamp
    --------------+------------------------------------
    8002793000504 | 010406770001455921fx9p;Fq[GS]N1
    8002793000504 | 010406770001455921Cpp;_e![GS]JX
    8002793000504 | 010406770001455921_Y&quot;&lt;anI[GS]9W
    ...

我以前有一个循环，为每个邮票生成了它属于哪个箱子，箱子的范围是多少，以及箱子中的邮票是哪些，格式如下：

         gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
    ---------------+------------+-----------+-----------------+-------------
     8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
     8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
     8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y&quot;&lt;anI[GS]9W
     8002793000504 |          1 | 1-6       |               4 | ...
     8002793000504 |          1 | 1-6       |               5 | ...
     8002793000504 |          1 | 1-6       |               6 | ...
     8002793000504 |          2 | 7-15      |               7 | ...
     8002793000504 |          2 | 7-15      |               8 | ...
     8002793000504 |          2 | 7-15      |               9 | ...
     8002793000504 |          2 | 7-15      |              10 | ...

实现这个的代码是这样的：

    stamp_lines['gtin_num']=stamp_lines.groupby(['gtin']).cumcount() # 对每个GTIN分组的邮票进行编号
    for i,row in self.order_lines.iterrows(): # 遍历订单行
        step=int(row['items_per_box'])  # 查看每个订单行中有多少瓶子在一个箱子里
        bottles=int(row['qty']) # 该订单行的总箱子数
        box=1 # 从1开始编号以便人类阅读
        if bottles <= 0 or step <= 0: 
            raise ValueError(f"在第{i-1}行缺少瓶子的数量.")
        for n in range(0, bottles, step):   # 对每个箱子的瓶子进行操作
            if n+step > bottles: # 如果最后一个箱子不满
                step = bottles - n
            # 为相应的stamp_lines列填充box_number、stamp_nr_in_box和box_range字段：
            stamp_lines.loc[(stamp_lines['gtin']==row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box','box_range']] = box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)
            box+=1

在1.4.2中运行良好。然而，当我升级到Pandas 2时，现在这一行：

    stamp_lines.loc[(stamp_lines['gtin']==row['gtin'])

会抛出一个错误：

    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我研究了一下，但似乎无法理解问题在哪里。我现在不能用range()和标量一起填写.loc[]吗？

英文:

I have a piece of code that worked fine in Pandas v1.4.2, but throws an error in 2.0.3.

So I have a dataframe order_lines with order lines, each line identified by unique EAN code (GTIN) and having number of bottles with that GTIN, and how many bottles fit in a box.

 lineitemid |     gtin      | items_per_box | box_qty | qty
------------+---------------+---------------+---------+------
          1 | 8002793000504 |             6 |     100 |  600
          2 | 8002793000597 |             6 |     200 | 1200

And a simple list of stamps stamp_lines: GTIN - unique stamp[,...] like this:

    gtin      |             datamatrix_stamp             
--------------+------------------------------------
8002793000504 | 010406770001455921fx9p;Fq[GS]N1
8002793000504 | 010406770001455921Cpp;_e![GS]JX
8002793000504 | 010406770001455921_Y&quot;&lt;anI[GS]9W
...

And I had a loop that generated for each stamp, which box it was in, what was the box range, and which stamp was which in the box, like this:

     gtin      | box_number | box_range | stamp_nr_in_box | datamatrix_stamp
---------------+------------+-----------+-----------------+-------------
 8002793000504 |          1 | 1-6       |               1 | 010406770001455921fx9p;Fq[GS]N1
 8002793000504 |          1 | 1-6       |               2 | 010406770001455921Cpp;_e![GS]JX
 8002793000504 |          1 | 1-6       |               3 | 010406770001455921_Y&quot;&lt;anI[GS]9W
 8002793000504 |          1 | 1-6       |               4 | ...
 8002793000504 |          1 | 1-6       |               5 | ...
 8002793000504 |          1 | 1-6       |               6 | ...
 8002793000504 |          2 | 7-15      |               7 | ...
 8002793000504 |          2 | 7-15      |               8 | ...
 8002793000504 |          2 | 7-15      |               9 | ...
 8002793000504 |          2 | 7-15      |              10 | ...

The code to do it was like this:

	stamp_lines[&#39;gtin_num&#39;]=stamp_lines.groupby([&#39;gtin&#39;]).cumcount() # list and number all stamps grouped by the GTIN 
	for i,row in self.order_lines.iterrows(): # go through order lines
		step=int(row[&#39;items_per_box&#39;])  # for each order line see how many bottles in box
		bottles=int(row[&#39;qty&#39;]) # total number of boxes for that order line 
		box=1 # start numbering with 1 for human readouts
		if bottles &lt;=0 or step &lt;=0: 
			raise ValueError(f&quot;Missing quantity of bottles in line {i-1}.&quot;)
		for n in range(0,bottles, step):   # do this for each box of bottles
			if n+step &gt; bottles: # if last box is not full
				step=bottles-n
			# fill in box_number,stamp_nr_in_box and box_range fields for 
			# appropriate stamp_lines column with respective numbers and ranges:
			stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
						&amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
							[&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)
			box+=1

This worked fine in 1.4.2.

As I upgraded to Pandas 2, however, the line stamp_lines.loc[( stamp_lines['gtin']==row['gtin'] )

now throws out an error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

I looked into this, but can't seem to understand, what seems to be the problem? Can I not now use range() and scalars together to fill out .loc[] ?

答案1

得分: 0

结果表明，我需要首先将这个东西转换成数据框。

所以，不是这样的：

    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = 
                                   box, range(n+1, n+step+1), '{}-{}'.format(n+1, n+step)

而是这样的：

    fill_df = pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series(['{}-{}'.format(n+1, n+step)] * step))
    
    stamp_lines.loc[(stamp_lines['gtin'] == row['gtin']) 
                            & (stamp_lines['gtin_num'].isin(range(n, n+step))),
                                ['box_number', 'stamp_nr_in_box', 'box_range']] = fill_df

英文:

Turns out, I needed to make the thing into dataframe first.

So, instead of this:

stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
                        &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
                            [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = 
                               box, range(n+1, n+step+1), &#39;{}-{}&#39;.format(n+1,n+step)

I had to do:

fill_df=pd.DataFrame(pd.Series(list(range(n+1, n+step+1))),  pd.Series([&#39;{}-{}&#39;.format(n+1,n+step) ] * step))
stamp_lines.loc[( stamp_lines[&#39;gtin&#39;]==row[&#39;gtin&#39;] ) 
                        &amp; (stamp_lines[&#39;gtin_num&#39;].isin(range(n,n+step))),
                            [&#39;box_number&#39;, &#39;stamp_nr_in_box&#39;,&#39;box_range&#39;]] = fill_df

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas主要版本中DataFrame操作的更改？

问题

答案1

如何正确将Django应用程序使用Dokku、Nginx和Gunicorn进行迁移。

pandas按组的窗口函数

TypeError when sum value of dictionary for each key

‘NoneType’ object has no attribute ‘name’ Jira issue type

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。