如何在Python中向数据框添加新列,并根据条件填充其值

huangapple go评论89阅读模式
英文:

How to add new column to a dataframe and fill its values based on condition in python

问题

I can help you with the translation for the code part. Here's the translation of the code you provided:

seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()

Translation:

seg = orders.loc[:, ['公司名称', '订单价值']].groupby('公司名称').sum()
conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)

Translation:

conditions = [
    (seg['订单价值'] >= 124485),
    (seg['订单价值'] >= 105503) & (seg['订单价值'] < 124485),
    (seg['订单价值'] >= 88174) & (seg['订单价值'] < 105503),
    (seg['订单价值'] < 88174)
]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['段'] = np.select(conditions, values)

Now, you can use the translated code to perform the same operations in Chinese.

英文:

So, I have this table with company names and the value of each order they ordered

Order Id Company Id Company Name Date Order Value
3455 80EYLOKP9E762WKG Chimera-Chasing 18-02-2017 2345
4875 TLEXR1HZWTUTBHPB Mellow Ezra 30-07-2015 3245
8425 839FKFW2LLX4LMBB Chimera-Chasing 27-05-2016 4566
4837 97OX39BGVMHODLJM Worst Mali 27-09-2018 5674
3434 5T4LGH4XGBWOD49Z Indonesian Grigory 14-01-2016 7654

And, I need to add a new column which will include the segment of each company based on their total orders value

I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)

So, my approach was to first aggregate this table into a new table with total orders value for each company

with this code:

seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()

Outcome:

Company Name Order Value
'48 Wills 65325
10-Day Causes 85473
10-Hour Leak 83021
Youngish Mark'S 120343
10-Year-Old Alba 97968
... ...

Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"

with this code

conditions = [
    (seg[&#39;Order Value&#39;] &gt;= 124485),
    (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
    (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
    (seg[&#39;Order Value&#39;] &lt; 88174)
                 ]

values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]

seg[&#39;Segment&#39;] = np.select(conditions, values)

Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders
but I dont know how to do that

答案1

得分: 0

我相信你想要的是使用pd.merge(请参见https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

orders = orders.merge(seg, on=['Company Name'], how='left')

请注意,你的数据框中会有一个重复的'Order Value'列。要修复这个问题,我建议在合并代码之前包括以下行:

seg = seg.rename(columns={'Order Value': 'Total Order Value'})

完整示例:

import pandas as pd
import numpy as np

data = {
    'Order ID': ['3455', '4875', '8425', '4837', '3434'],
    'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
    'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
    'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'],
    'Order Value': [2345, 3245, 4566, 5674, 7654]
}

orders = pd.DataFrame(data=data)
seg = orders.loc[:, ['Company Name', 'Order Value']].groupby('Company Name').sum()

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})

orders = orders.merge(seg, on=['Company Name'], how='left')

print(orders)

如果不需要'Total Order Value'列,可以使用以下行删除它:

orders = orders.drop(labels=['Total Order Value'], axis=1)
英文:

I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)

Note that you will have a duplicated 'Order Value' column in your dataframe. To fix this I would include the following line in before the merge code:

seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})

Full example:

import pandas as pd
import numpy as np

data = {
    &#39;Order ID&#39;: [&#39;3455&#39;, &#39;4875&#39;, &#39;8425&#39;, &#39;4837&#39;, &#39;3434&#39;],
    &#39;Company ID&#39;: [&#39;80EYLOKP9E762WKG&#39;, &#39;TLEXR1HZWTUTBHPB&#39;, &#39;839FKFW2LLX4LMBB&#39;, &#39;97OX39BGVMHODLJM&#39;, &#39;5T4LGH4XGBWOD49Z&#39;],
    &#39;Company Name&#39;: [&#39;Chimera-Chasing&#39;, &#39;Mellow Ezra&#39;, &#39;Chimera-Chasing&#39;, &#39;Worst Mali&#39;, &#39;Indonesian Grigory&#39;],
    &#39;Date&#39;: [&#39;18-02-2017&#39;, &#39;30-07-2015&#39;, &#39;27-05-2016&#39;, &#39;27-09-2018&#39;, &#39;14-01-2016&#39;	],
    &#39;Order Value&#39;: [2345, 3245, 4566, 5674, 7654]
}

orders = pd.DataFrame(data = data)
seg = orders.loc[:,[&#39;Company Name&#39;, &#39;Order Value&#39;]].groupby(&#39;Company Name&#39;).sum()

conditions = [
    (seg[&#39;Order Value&#39;] &gt;= 124485),
    (seg[&#39;Order Value&#39;] &gt;= 105503) &amp; (seg[&#39;Order Value&#39;] &lt; 124485),
    (seg[&#39;Order Value&#39;] &gt;= 88174) &amp; (seg[&#39;Order Value&#39;] &lt; 105503),
    (seg[&#39;Order Value&#39;] &lt; 88174)
                 ]

values = [&#39;Prime&#39;, &#39;Platinum&#39;, &#39;Gold&#39;, &#39;Silver&#39;]

seg[&#39;Segment&#39;] = np.select(conditions, values)
seg = seg.rename(columns={&#39;Order Value&#39;: &#39;Total Order Value&#39;})

orders = orders.merge(seg, on=[&#39;Company Name&#39;], how=&#39;left&#39;)

print(orders)
  Order ID        Company ID        Company Name        Date  Order Value  Total Order Value Segment
0     3455  80EYLOKP9E762WKG     Chimera-Chasing  18-02-2017         2345               6911  Silver
1     4875  TLEXR1HZWTUTBHPB         Mellow Ezra  30-07-2015         3245               3245  Silver
2     8425  839FKFW2LLX4LMBB     Chimera-Chasing  27-05-2016         4566               6911  Silver
3     4837  97OX39BGVMHODLJM          Worst Mali  27-09-2018         5674               5674  Silver
4     3434  5T4LGH4XGBWOD49Z  Indonesian Grigory  14-01-2016         7654               7654  Silver

You can delete the 'Total Order Value' column with the following line if you do not want it:

orders = orders.drop(labels=[&#39;Total Order Value&#39;], axis=1)

huangapple
  • 本文由 发表于 2023年2月24日 02:06:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548693.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定