英文:
How to add new column to a dataframe and fill its values based on condition in python
问题
I can help you with the translation for the code part. Here's the translation of the code you provided:
seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()
Translation:
seg = orders.loc[:, ['公司名称', '订单价值']].groupby('公司名称').sum()
conditions = [
(seg['Order Value'] >= 124485),
(seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
(seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
(seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)
Translation:
conditions = [
(seg['订单价值'] >= 124485),
(seg['订单价值'] >= 105503) & (seg['订单价值'] < 124485),
(seg['订单价值'] >= 88174) & (seg['订单价值'] < 105503),
(seg['订单价值'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['段'] = np.select(conditions, values)
Now, you can use the translated code to perform the same operations in Chinese.
英文:
So, I have this table with company names and the value of each order they ordered
Order Id | Company Id | Company Name | Date | Order Value |
---|---|---|---|---|
3455 | 80EYLOKP9E762WKG | Chimera-Chasing | 18-02-2017 | 2345 |
4875 | TLEXR1HZWTUTBHPB | Mellow Ezra | 30-07-2015 | 3245 |
8425 | 839FKFW2LLX4LMBB | Chimera-Chasing | 27-05-2016 | 4566 |
4837 | 97OX39BGVMHODLJM | Worst Mali | 27-09-2018 | 5674 |
3434 | 5T4LGH4XGBWOD49Z | Indonesian Grigory | 14-01-2016 | 7654 |
And, I need to add a new column which will include the segment of each company based on their total orders value
I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)
So, my approach was to first aggregate this table into a new table with total orders value for each company
with this code:
seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()
Outcome:
Company Name | Order Value |
---|---|
'48 Wills | 65325 |
10-Day Causes | 85473 |
10-Hour Leak | 83021 |
Youngish Mark'S | 120343 |
10-Year-Old Alba | 97968 |
... | ... |
Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"
with this code
conditions = [
(seg['Order Value'] >= 124485),
(seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
(seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
(seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)
Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders
but I dont know how to do that
答案1
得分: 0
我相信你想要的是使用pd.merge(请参见https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):
orders = orders.merge(seg, on=['Company Name'], how='left')
请注意,你的数据框中会有一个重复的'Order Value'列。要修复这个问题,我建议在合并代码之前包括以下行:
seg = seg.rename(columns={'Order Value': 'Total Order Value'})
完整示例:
import pandas as pd
import numpy as np
data = {
'Order ID': ['3455', '4875', '8425', '4837', '3434'],
'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'],
'Order Value': [2345, 3245, 4566, 5674, 7654]
}
orders = pd.DataFrame(data=data)
seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()
conditions = [
(seg['Order Value'] >= 124485),
(seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
(seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
(seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})
orders = orders.merge(seg, on=['Company Name'], how='left')
print(orders)
如果不需要'Total Order Value'列,可以使用以下行删除它:
orders = orders.drop(labels=['Total Order Value'], axis=1)
英文:
I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):
orders = orders.merge(seg, on=['Company Name'], how='left')
Note that you will have a duplicated 'Order Value' column in your dataframe. To fix this I would include the following line in before the merge code:
seg = seg.rename(columns={'Order Value': 'Total Order Value'})
Full example:
import pandas as pd
import numpy as np
data = {
'Order ID': ['3455', '4875', '8425', '4837', '3434'],
'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016' ],
'Order Value': [2345, 3245, 4566, 5674, 7654]
}
orders = pd.DataFrame(data = data)
seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()
conditions = [
(seg['Order Value'] >= 124485),
(seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
(seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
(seg['Order Value'] < 88174)
]
values = ['Prime', 'Platinum', 'Gold', 'Silver']
seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})
orders = orders.merge(seg, on=['Company Name'], how='left')
print(orders)
Order ID Company ID Company Name Date Order Value Total Order Value Segment
0 3455 80EYLOKP9E762WKG Chimera-Chasing 18-02-2017 2345 6911 Silver
1 4875 TLEXR1HZWTUTBHPB Mellow Ezra 30-07-2015 3245 3245 Silver
2 8425 839FKFW2LLX4LMBB Chimera-Chasing 27-05-2016 4566 6911 Silver
3 4837 97OX39BGVMHODLJM Worst Mali 27-09-2018 5674 5674 Silver
4 3434 5T4LGH4XGBWOD49Z Indonesian Grigory 14-01-2016 7654 7654 Silver
You can delete the 'Total Order Value' column with the following line if you do not want it:
orders = orders.drop(labels=['Total Order Value'], axis=1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论