Panda数据框架 – 根据其他列的条件添加值到新列

huangapple go评论72阅读模式
英文:

Panda Dataframe - Add values to new column based on criteria of other columns

问题

我正在尝试将值添加到新数据框(df2)的列(折扣%),该列的值必须基于df1中的“Grid”和实体,我的结构如下所示

所以如果对于相同的实体,DF1中的列是91-120,那么它应该在DF2下的Discount%中添加20,如果DF1中的列是61-90,那么它必须在DF2中添加5,并依此类推。

这些数据是从一个大型csv文件导入的,到目前为止我尝试过以下方法,但是只用0填充了

for j in range(0,len(df1)):
for i in range(0,len(df2)):
if grid['91-120'][j] in df2['Grid'][i]:
df2.loc[i, 'Grid%'] = df1['91-120'][j]
英文:

I am trying to add values to a new dataframe (df2) column (Discount%), the values in this column must be based on "Grid" and Entity from df1),
My structure is as following
Panda数据框架 – 根据其他列的条件添加值到新列

so if for the same entity, the column in DF1 is 91-120 then it should add 20 to DF2 under Discount%, if the column in DF1 is 61-90 then it must add 5 to DF2 and so one.

The data is imported from a large csv file, so far I have tried below but if fill only with 0

for j in range(0,len(df1)):
for i in range(0,len(df2)):
if grid['91-120'][j] in df2['Grid'][i]:
#df['Grid%'][i] = grid['91-120'][j]
df2.loc[i, 'Grid%'] = df1['91-120'][j]

thank you

答案1

得分: 0

我目前正在处理一个与遍历数据框相关的类似问题。如果可能的话,你真的不希望这样做,特别是如果数据框中包含像你的DF1那样的重复值。

我建议将参考数据框DF1转换为一个以索引为导向的字典,然后从该字典中为DF2分配值,如下所示。

DF1 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], '0-60': [0, 0, 0, 0], '61-90': [0, 5, 10, 5], '91-120':[20, 5, 20, 20], '121-180':[10, 5, 12, 15], '181-240':[20, 5, 22, 25]})
DF2 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], 'Grid': ['360+', '61-90', '0-60', '91-120']})

print('DF2 before:')
print(DF2)

DF1.drop_duplicates(inplace=True)
DF1.set_index('Entity', inplace=True)
d = DF1.to_dict('index')

def get_discount(entity, grid):
    if entity in d and grid in d[entity]:
        return d[entity][grid]
    else:
        return None

DF2['Discount %'] = DF2.apply(lambda x: get_discount(x['Entity'], x['Grid']), axis=1)

print('DF2 after:')
print(DF2)

我找到了这个解决方案,因为如我之前提到的,我目前正在处理一个类似的问题。了解到遍历数据框对函数性能的不利影响后,我意识到从字典中分配值会更快。我查找了如何将数据框转换为字典的方法,网上有关于此的问题在Stack Overflowpandas文档上都有说明。接下来,我查找了如何从字典中为数据框分配值的方法,Stack Overflow上有相关问题。起初,我尝试了“dict”导向的方法。我可以让它基于“Grid”为每个“Entity”分配所有的折扣值,但我无法选择正确的折扣值。我找不到其他关于如何从2D字典为数据框分配值的解决方案,所以我转向了ChatGPT。ChatGPT完成后,每个字段都返回了“None”。最终,我让它建议将字典的导向从“dict”更改为“series”。那也不起作用,但我想尝试所有其他的导向。最后,我使用了“Index”导向。

不足之处是折扣率的值是浮点数。好处是它可以处理字典中没有Grid值的情况(例如Grid为“360+”的情况)。

英文:

I'm currently dealing with a similar problem related to iterating over a dataframe. You really don't want to do that if it could be avoided, especially if the dataframe contains duplicate values like your DF1.
I would recommend converting the reference dataframe, DF1, to a dictionary with the index orientation and then assigning the value to DF2 from that dictionary as shown below.

DF1 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], '0-60': [0, 0, 0, 0], '61-90': [0, 5, 10, 5], '91-120':[20, 5, 20, 20], '121-180':[10, 5, 12, 15], '181-240':[20, 5, 22, 25]})
DF2 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], 'Grid': ['360+', '61-90', '0-60', '91-120']})

print('DF2 before:')
print(DF2)

DF1.drop_duplicates(inplace=True)
DF1.set_index('Entity', inplace=True)
d = DF1.to_dict('index')

def get_discount(entity, grid):
    if entity in d and grid in d[entity]:
        return d[entity][grid]
    else:
        return None

DF2['Discount %'] = DF2.apply(lambda x: get_discount(x['Entity'], x['Grid']), axis=1)

print('DF2 after:')
print(DF2)

I found this solution because, as I mentioned before, I'm currently working on a similar problem.<br/>Knowing how detrimental iteration over a dataframe can be to the performance of a function, I realized it would be faster to assign a value from a dictionary. I looked up how to convert a dataframe to a dictionary on Stack Overflow and in the pandas documentation. Next, I looked up how to assign a value to a dataframe from a dictionary on Stack Overflow. I was trying the "dict" orientation at first. I could get it to assign all discount values for each "Entity" based on "Grid", but I couldn't select the one right discount value.<br/>I couldn't find any other solutions online for assigning a value to a dataframe from a 2D dictionary, so I turned to ChatGPT. After ChatGPT did its thing, I was getting "None" in every field. Eventually, I got it to recommend changing the orientation of the dictionary from "dict" to "series". That also didn't work, but I figured I would try all the other orientations. Index worked.<br/>The downside is that Discount % values are floats. The upside is it can handle cases where you don't have a value for Grid in the dictionary (e.g. where Grid is "360+").

huangapple
  • 本文由 发表于 2023年6月14日 23:53:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定