2023年2月24日 02:06:52go评论133阅读模式

英文:

How to add new column to a dataframe and fill its values based on condition in python

问题

I can help you with the translation for the code part. Here's the translation of the code you provided:

seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()

Translation:

seg = orders.loc[:, ['公司名称', '订单价值']].groupby('公司名称').sum()

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)

Translation:

conditions = [
    (seg['订单价值'] >= 124485),
    (seg['订单价值'] >= 105503) & (seg['订单价值'] < 124485),
    (seg['订单价值'] >= 88174) & (seg['订单价值'] < 105503),
    (seg['订单价值'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['段'] = np.select(conditions, values)

Now, you can use the translated code to perform the same operations in Chinese.

英文:

So, I have this table with company names and the value of each order they ordered

Order Id	Company Id	Company Name	Date	Order Value
3455	80EYLOKP9E762WKG	Chimera-Chasing	18-02-2017	2345
4875	TLEXR1HZWTUTBHPB	Mellow Ezra	30-07-2015	3245
8425	839FKFW2LLX4LMBB	Chimera-Chasing	27-05-2016	4566
4837	97OX39BGVMHODLJM	Worst Mali	27-09-2018	5674
3434	5T4LGH4XGBWOD49Z	Indonesian Grigory	14-01-2016	7654

And, I need to add a new column which will include the segment of each company based on their total orders value

I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)

So, my approach was to first aggregate this table into a new table with total orders value for each company

with this code:

seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()

Outcome:

Company Name	Order Value
'48 Wills	65325
10-Day Causes	85473
10-Hour Leak	83021
Youngish Mark'S	120343
10-Year-Old Alba	97968
...	...

Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"

with this code

conditions = [
    (seg[&#39;Order Value&#39;] &gt;= 124485),
    (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
    (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
    (seg[&#39;Order Value&#39;] &lt; 88174)
                 ]
values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]
seg[&#39;Segment&#39;] = np.select(conditions, values)

Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders
but I dont know how to do that

答案1

得分: 0

我相信你想要的是使用pd.merge（请参见https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html）：

orders = orders.merge(seg, on=['Company Name'], how='left')

请注意，你的数据框中会有一个重复的'Order Value'列。要修复这个问题，我建议在合并代码之前包括以下行：

seg = seg.rename(columns={'Order Value': 'Total Order Value'})

完整示例：

import pandas as pd
import numpy as np
data = {
    'Order ID': ['3455', '4875', '8425', '4837', '3434'],
    'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
    'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
    'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'],
    'Order Value': [2345, 3245, 4566, 5674, 7654]
}
orders = pd.DataFrame(data=data)
seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()
conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})
orders = orders.merge(seg, on=['Company Name'], how='left')
print(orders)

如果不需要'Total Order Value'列，可以使用以下行删除它：

orders = orders.drop(labels=['Total Order Value'], axis=1)

英文:

I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)

Note that you will have a duplicated 'Order Value' column in your dataframe. To fix this I would include the following line in before the merge code:

seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})

Full example:

import pandas as pd
import numpy as np
data = {
    &#39;Order ID&#39;: [&#39;3455&#39;, &#39;4875&#39;, &#39;8425&#39;, &#39;4837&#39;, &#39;3434&#39;],
    &#39;Company ID&#39;: [&#39;80EYLOKP9E762WKG&#39;, &#39;TLEXR1HZWTUTBHPB&#39;, &#39;839FKFW2LLX4LMBB&#39;, &#39;97OX39BGVMHODLJM&#39;, &#39;5T4LGH4XGBWOD49Z&#39;],
    &#39;Company Name&#39;: [&#39;Chimera-Chasing&#39;, &#39;Mellow Ezra&#39;, &#39;Chimera-Chasing&#39;, &#39;Worst Mali&#39;, &#39;Indonesian Grigory&#39;],
    &#39;Date&#39;: [&#39;18-02-2017&#39;, &#39;30-07-2015&#39;, &#39;27-05-2016&#39;, &#39;27-09-2018&#39;, &#39;14-01-2016&#39;	],
    &#39;Order Value&#39;: [2345, 3245, 4566, 5674, 7654]
}
orders = pd.DataFrame(data = data)
seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()
conditions = [
    (seg[&#39;Order Value&#39;] &gt;= 124485),
    (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
    (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
    (seg[&#39;Order Value&#39;] &lt; 88174)
                 ]
values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]
seg[&#39;Segment&#39;] = np.select(conditions, values)
seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})
orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)
print(orders)
  Order ID        Company ID        Company Name        Date  Order Value  Total Order Value Segment
0     3455  80EYLOKP9E762WKG     Chimera-Chasing  18-02-2017         2345               6911  Silver
1     4875  TLEXR1HZWTUTBHPB         Mellow Ezra  30-07-2015         3245               3245  Silver
2     8425  839FKFW2LLX4LMBB     Chimera-Chasing  27-05-2016         4566               6911  Silver
3     4837  97OX39BGVMHODLJM          Worst Mali  27-09-2018         5674               5674  Silver
4     3434  5T4LGH4XGBWOD49Z  Indonesian Grigory  14-01-2016         7654               7654  Silver

You can delete the 'Total Order Value' column with the following line if you do not want it:

orders = orders.drop(labels=[&#39;Total Order Value&#39;], axis=1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中向数据框添加新列，并根据条件填充其值

问题

答案1

CPU在使用PyTorch Lightning训练模型时内存不足。

Installing GDAL for python in Google Cloud Functions — error when deploying

如何使`cv2.HoughLinesP` 仅检测垂直线？

嵌套字典循环与def（缩进问题）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。