如何在Python中向数据框添加新列,并根据条件填充其值

huangapple go评论133阅读模式
英文:

How to add new column to a dataframe and fill its values based on condition in python

问题

I can help you with the translation for the code part. Here's the translation of the code you provided:

  1. seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()

Translation:

  1. seg = orders.loc[:, ['公司名称', '订单价值']].groupby('公司名称').sum()
  1. conditions = [
  2. (seg['Order Value'] >= 124485),
  3. (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
  4. (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
  5. (seg['Order Value'] < 88174)
  6. ]
  7. values = ['Prime', 'Platinum', 'Gold', 'Silver']
  8. seg['Segment'] = np.select(conditions, values)

Translation:

  1. conditions = [
  2. (seg['订单价值'] >= 124485),
  3. (seg['订单价值'] >= 105503) & (seg['订单价值'] < 124485),
  4. (seg['订单价值'] >= 88174) & (seg['订单价值'] < 105503),
  5. (seg['订单价值'] < 88174)
  6. ]
  7. values = ['Prime', 'Platinum', 'Gold', 'Silver']
  8. seg['段'] = np.select(conditions, values)

Now, you can use the translated code to perform the same operations in Chinese.

英文:

So, I have this table with company names and the value of each order they ordered

Order Id Company Id Company Name Date Order Value
3455 80EYLOKP9E762WKG Chimera-Chasing 18-02-2017 2345
4875 TLEXR1HZWTUTBHPB Mellow Ezra 30-07-2015 3245
8425 839FKFW2LLX4LMBB Chimera-Chasing 27-05-2016 4566
4837 97OX39BGVMHODLJM Worst Mali 27-09-2018 5674
3434 5T4LGH4XGBWOD49Z Indonesian Grigory 14-01-2016 7654

And, I need to add a new column which will include the segment of each company based on their total orders value

I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)

So, my approach was to first aggregate this table into a new table with total orders value for each company

with this code:

  1. seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()

Outcome:

Company Name Order Value
'48 Wills 65325
10-Day Causes 85473
10-Hour Leak 83021
Youngish Mark'S 120343
10-Year-Old Alba 97968
... ...

Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"

with this code

  1. conditions = [
  2. (seg[&#39;Order Value&#39;] &gt;= 124485),
  3. (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
  4. (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
  5. (seg[&#39;Order Value&#39;] &lt; 88174)
  6. ]
  7. values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]
  8. seg[&#39;Segment&#39;] = np.select(conditions, values)

Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders
but I dont know how to do that

答案1

得分: 0

我相信你想要的是使用pd.merge(请参见https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

  1. orders = orders.merge(seg, on=['Company Name'], how='left')

请注意,你的数据框中会有一个重复的'Order Value'列。要修复这个问题,我建议在合并代码之前包括以下行:

  1. seg = seg.rename(columns={'Order Value': 'Total Order Value'})

完整示例:

  1. import pandas as pd
  2. import numpy as np
  3. data = {
  4. 'Order ID': ['3455', '4875', '8425', '4837', '3434'],
  5. 'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
  6. 'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
  7. 'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'],
  8. 'Order Value': [2345, 3245, 4566, 5674, 7654]
  9. }
  10. orders = pd.DataFrame(data=data)
  11. seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()
  12. conditions = [
  13. (seg['Order Value'] >= 124485),
  14. (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
  15. (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
  16. (seg['Order Value'] < 88174)
  17. ]
  18. values = ['Prime', 'Platinum', 'Gold', 'Silver']
  19. seg['Segment'] = np.select(conditions, values)
  20. seg = seg.rename(columns={'Order Value': 'Total Order Value'})
  21. orders = orders.merge(seg, on=['Company Name'], how='left')
  22. print(orders)

如果不需要'Total Order Value'列,可以使用以下行删除它:

  1. orders = orders.drop(labels=['Total Order Value'], axis=1)
英文:

I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

  1. orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)

Note that you will have a duplicated 'Order Value' column in your dataframe. To fix this I would include the following line in before the merge code:

  1. seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})

Full example:

  1. import pandas as pd
  2. import numpy as np
  3. data = {
  4. &#39;Order ID&#39;: [&#39;3455&#39;, &#39;4875&#39;, &#39;8425&#39;, &#39;4837&#39;, &#39;3434&#39;],
  5. &#39;Company ID&#39;: [&#39;80EYLOKP9E762WKG&#39;, &#39;TLEXR1HZWTUTBHPB&#39;, &#39;839FKFW2LLX4LMBB&#39;, &#39;97OX39BGVMHODLJM&#39;, &#39;5T4LGH4XGBWOD49Z&#39;],
  6. &#39;Company Name&#39;: [&#39;Chimera-Chasing&#39;, &#39;Mellow Ezra&#39;, &#39;Chimera-Chasing&#39;, &#39;Worst Mali&#39;, &#39;Indonesian Grigory&#39;],
  7. &#39;Date&#39;: [&#39;18-02-2017&#39;, &#39;30-07-2015&#39;, &#39;27-05-2016&#39;, &#39;27-09-2018&#39;, &#39;14-01-2016&#39; ],
  8. &#39;Order Value&#39;: [2345, 3245, 4566, 5674, 7654]
  9. }
  10. orders = pd.DataFrame(data = data)
  11. seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()
  12. conditions = [
  13. (seg[&#39;Order Value&#39;] &gt;= 124485),
  14. (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
  15. (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
  16. (seg[&#39;Order Value&#39;] &lt; 88174)
  17. ]
  18. values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]
  19. seg[&#39;Segment&#39;] = np.select(conditions, values)
  20. seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})
  21. orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)
  22. print(orders)
  23. Order ID Company ID Company Name Date Order Value Total Order Value Segment
  24. 0 3455 80EYLOKP9E762WKG Chimera-Chasing 18-02-2017 2345 6911 Silver
  25. 1 4875 TLEXR1HZWTUTBHPB Mellow Ezra 30-07-2015 3245 3245 Silver
  26. 2 8425 839FKFW2LLX4LMBB Chimera-Chasing 27-05-2016 4566 6911 Silver
  27. 3 4837 97OX39BGVMHODLJM Worst Mali 27-09-2018 5674 5674 Silver
  28. 4 3434 5T4LGH4XGBWOD49Z Indonesian Grigory 14-01-2016 7654 7654 Silver

You can delete the 'Total Order Value' column with the following line if you do not want it:

  1. orders = orders.drop(labels=[&#39;Total Order Value&#39;], axis=1)

huangapple
  • 本文由 发表于 2023年2月24日 02:06:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548693.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定