英文:
Panda Dataframe - Add values to new column based on criteria of other columns
问题
我正在尝试将值添加到新数据框(df2)的列(折扣%),该列的值必须基于df1中的“Grid”和实体,我的结构如下所示
所以如果对于相同的实体,DF1中的列是91-120,那么它应该在DF2下的Discount%中添加20,如果DF1中的列是61-90,那么它必须在DF2中添加5,并依此类推。
这些数据是从一个大型csv文件导入的,到目前为止我尝试过以下方法,但是只用0填充了
for j in range(0,len(df1)):
for i in range(0,len(df2)):
if grid['91-120'][j] in df2['Grid'][i]:
df2.loc[i, 'Grid%'] = df1['91-120'][j]
英文:
I am trying to add values to a new dataframe (df2) column (Discount%), the values in this column must be based on "Grid" and Entity from df1),
My structure is as following
so if for the same entity, the column in DF1 is 91-120 then it should add 20 to DF2 under Discount%, if the column in DF1 is 61-90 then it must add 5 to DF2 and so one.
The data is imported from a large csv file, so far I have tried below but if fill only with 0
for j in range(0,len(df1)):
for i in range(0,len(df2)):
if grid['91-120'][j] in df2['Grid'][i]:
#df['Grid%'][i] = grid['91-120'][j]
df2.loc[i, 'Grid%'] = df1['91-120'][j]
thank you
答案1
得分: 0
我目前正在处理一个与遍历数据框相关的类似问题。如果可能的话,你真的不希望这样做,特别是如果数据框中包含像你的DF1那样的重复值。
我建议将参考数据框DF1转换为一个以索引为导向的字典,然后从该字典中为DF2分配值,如下所示。
DF1 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], '0-60': [0, 0, 0, 0], '61-90': [0, 5, 10, 5], '91-120':[20, 5, 20, 20], '121-180':[10, 5, 12, 15], '181-240':[20, 5, 22, 25]})
DF2 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], 'Grid': ['360+', '61-90', '0-60', '91-120']})
print('DF2 before:')
print(DF2)
DF1.drop_duplicates(inplace=True)
DF1.set_index('Entity', inplace=True)
d = DF1.to_dict('index')
def get_discount(entity, grid):
if entity in d and grid in d[entity]:
return d[entity][grid]
else:
return None
DF2['Discount %'] = DF2.apply(lambda x: get_discount(x['Entity'], x['Grid']), axis=1)
print('DF2 after:')
print(DF2)
我找到了这个解决方案,因为如我之前提到的,我目前正在处理一个类似的问题。了解到遍历数据框对函数性能的不利影响后,我意识到从字典中分配值会更快。我查找了如何将数据框转换为字典的方法,网上有关于此的问题在Stack Overflow和pandas文档上都有说明。接下来,我查找了如何从字典中为数据框分配值的方法,Stack Overflow上有相关问题。起初,我尝试了“dict”导向的方法。我可以让它基于“Grid”为每个“Entity”分配所有的折扣值,但我无法选择正确的折扣值。我找不到其他关于如何从2D字典为数据框分配值的解决方案,所以我转向了ChatGPT。ChatGPT完成后,每个字段都返回了“None”。最终,我让它建议将字典的导向从“dict”更改为“series”。那也不起作用,但我想尝试所有其他的导向。最后,我使用了“Index”导向。
不足之处是折扣率的值是浮点数。好处是它可以处理字典中没有Grid值的情况(例如Grid为“360+”的情况)。
英文:
I'm currently dealing with a similar problem related to iterating over a dataframe. You really don't want to do that if it could be avoided, especially if the dataframe contains duplicate values like your DF1.
I would recommend converting the reference dataframe, DF1, to a dictionary with the index orientation and then assigning the value to DF2 from that dictionary as shown below.
DF1 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], '0-60': [0, 0, 0, 0], '61-90': [0, 5, 10, 5], '91-120':[20, 5, 20, 20], '121-180':[10, 5, 12, 15], '181-240':[20, 5, 22, 25]})
DF2 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], 'Grid': ['360+', '61-90', '0-60', '91-120']})
print('DF2 before:')
print(DF2)
DF1.drop_duplicates(inplace=True)
DF1.set_index('Entity', inplace=True)
d = DF1.to_dict('index')
def get_discount(entity, grid):
if entity in d and grid in d[entity]:
return d[entity][grid]
else:
return None
DF2['Discount %'] = DF2.apply(lambda x: get_discount(x['Entity'], x['Grid']), axis=1)
print('DF2 after:')
print(DF2)
I found this solution because, as I mentioned before, I'm currently working on a similar problem.<br/>Knowing how detrimental iteration over a dataframe can be to the performance of a function, I realized it would be faster to assign a value from a dictionary. I looked up how to convert a dataframe to a dictionary on Stack Overflow and in the pandas documentation. Next, I looked up how to assign a value to a dataframe from a dictionary on Stack Overflow. I was trying the "dict" orientation at first. I could get it to assign all discount values for each "Entity" based on "Grid", but I couldn't select the one right discount value.<br/>I couldn't find any other solutions online for assigning a value to a dataframe from a 2D dictionary, so I turned to ChatGPT. After ChatGPT did its thing, I was getting "None" in every field. Eventually, I got it to recommend changing the orientation of the dictionary from "dict" to "series". That also didn't work, but I figured I would try all the other orientations. Index worked.<br/>The downside is that Discount % values are floats. The upside is it can handle cases where you don't have a value for Grid in the dictionary (e.g. where Grid is "360+").
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论