如何在使用循环时解决Pandas数据框中的’KeyError’问题,当使用自定义函数时?

huangapple go评论67阅读模式
英文:

How to resolve 'KeyError' issue in Pandas dataframe when using a custom function in a loop?

问题

I see that you're encountering an issue with your Python code, and you'd like assistance in resolving it. It appears that the error you're encountering is related to a KeyError when trying to access a value in a DataFrame within a loop. Here is the translated code portion with the error message:

while len(df) > 0:
    
    if current_day % 15 == 0 and current_day != 0 and Indicator == 0:
        for i, j in zip(Capacities, Charge):
            for key in Capacities[i]:
                Capacities[i][key] = Capacities[i][key] + j
    
    Indicator = 0
    
    select = df[df['Date of Delivery'] == current_day]
    if len(select) == 0:
        Daily_Capacities.append(copy.deepcopy(Capacities))        
        current_day += 1
        continue
    
    if len(select) > 1:
        Indicator = 1
        select = select[select['Total Weight'] == select['Total Weight'].max()]
    
    if len(select) > 1:
        select = select[select['Latest Time to Deliver'] == select['Latest Time to Deliver'].min()]
    
    if len(select) > 1:
        select = select.sample()
  
    Available_Warehouses = []
    for key, value in Capacities[select['Type'].iloc[0]].items():
        if select['Total Weight'].iloc[0] <= value:
                Available_Warehouses.append({key:value})
    
    if len(Available_Warehouses) == 0:
        df.loc[select.index, 'Date of Delivery'] += 1
        if Indicator == 1:
            continue        
        Daily_Capacities.append(copy.deepcopy(Capacities))
        current_day += 1
        continue
                
    elif len(Available_Warehouses) == 1:
        Warehouse = str(*Available_Warehouses[0].keys())
        
    elif len(Available_Warehouses) > 1:
        Warehouse = Select_Warhouse(select, Available_Warehouses)
                        
    Feasible_Distribution_Plan = Feasible_Distribution_Plan.append(select.iloc[0], ignore_index= True)
    df.drop(select.index, inplace= True)
    Capacities[select['Type'].iloc[0]][Warehouse] -= select.iloc[0]['Total Weight']
    
    if Indicator == 1:
        continue
    
    for key in dailycapacities:
        Daily_Capacities.append(copy.deepcopy(Capacities))        
    current_day += 1             

Feasible_Distribution_Plan['Date of Delivery'] = Feasible_Distribution_Plan['Date of Delivery'].astype('int32')
        
Feasible_Distribution_Plan

The error message you're encountering is a KeyError, which usually occurs when you try to access a key that doesn't exist in a dictionary or a column in a DataFrame. In your case, it seems to be related to the line:

col_val = df.loc[0, col_name]

It's trying to access the value in the DataFrame df at row 0 and the column specified by col_name. The error suggests that the column specified by col_name does not exist in the DataFrame, causing the KeyError.

To resolve this issue, you should check that col_name is a valid column name in your DataFrame. You can add some debugging statements to print the value of col_name and make sure it matches the expected column name. Additionally, ensure that your DataFrame df is correctly defined and contains the expected columns.

If you have further questions or need assistance with specific parts of your code, please let me know.

英文:

Unable to find out the problem of a defined function that is used in a loop.

The program I am trying to develop, is about a supply chain management that involves planing and managing deliveries and warehouse capacities. I have written a function that is working outside the main code, but when I try to use it in the main procedure, it faces with an error.
Assume that there is a pandas dataframe called df like below:

df = 
{&#39;Name&#39;: {0: &#39;a&#39;, 1: &#39;c&#39;, 2: &#39;j&#39;, 3: &#39;d&#39;, 4: &#39;e&#39;},
&#39;Type&#39;: {0: 1, 1: 1, 2: 1, 3: 2, 4: 2},
&#39;Number of Beams&#39;: {0: 60, 1: 60, 2: 60, 3: 60, 4: 60},
&#39;Number of Columns&#39;: {0: 25, 1: 25, 2: 25, 3: 25, 4: 25},
&#39;Total Weight&#39;: {0: 120, 1: 125, 2: 130, 3: 145, 4: 145},
&#39;Warehouse1 Distance to Site Location&#39;: {0: 968, 1: 447, 2: 580, 3: 245, 4: 100},
&#39;Warehouse2 Distance to Site Location&#39;: {0: 220, 1: 513, 2: 123, 3: 35, 4: 940},
&#39;Warehouse3 Distance to Site Location&#39;: {0: 215, 1: 617, 2: 319, 3: 175, 4: 228},
&#39;Distance to Site Location&#39;: {0: 215, 1: 447, 2: 123, 3: 35, 4: 100},
&#39;Date of Registeration&#39;: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
&#39;Earliest Time to Deliver&#39;: {0: 8, 1: 9, 2: 7, 3: 8, 4: 8},
&#39;Latest Time to Deliver&#39;: {0: 10, 1: 10, 2: 11, 3: 12, 4: 9},
&#39;Frame Cost&#39;: {0: 3720, 1: 3875, 2: 4030, 3: 4495, 4: 4495},
&#39;Transportation Cost&#39;: {0: 516, 1: 1117.5, 2: 319.8, 3: 101.5, 4: 290},
&#39;Date of Delivery&#39;: {0: 8, 1: 9, 2: 7, 3: 8, 4: 8}}

And there are some variables that need to be defined as below:

current_day = 0
Charge = [250, 130, 140, 200]
Capacities = {1:{&quot;Warehouse1&quot;:250, &quot;Warehouse2&quot;:250, &quot;Warehouse3&quot;:250}, 
2:{&quot;Warehouse1&quot;:130, &quot;Warehouse2&quot;:130, &quot;Warehouse3&quot;:130}, 
3:{&quot;Warehouse1&quot;:140, &quot;Warehouse2&quot;:140, &quot;Warehouse3&quot;:140},
4:{&quot;Warehouse1&quot;:200, &quot;Warehouse2&quot;:200, &quot;Warehouse3&quot;:200}}
Daily_Capacities = []
Indicator = 0

The function that I mentioned before is:

def Select_Warhouse(df, li):
# Initialize variables
min_col = None
min_val = float(&#39;inf&#39;)
results = []
# Loop over dictionaries
for dict_item in li:
for key in dict_item.keys():
# Check if key is a partial match with any column name
matching_col = [col for col in df.columns if key in col]
if matching_col:
# Retrieve column name and value
col_name = matching_col[0]
col_val = df.loc[0, col_name]
results.append((col_name, col_val))
# Update minimum value if necessary
if col_val &lt; min_val:
min_col = col_name
min_val = col_val
output = [el for el in li if list(el.keys())[0] in min_col.split()[0]]
return str(*output[0].keys())

Finally, the loop that tries to determine a feasible distribution plan for delivery is this:

while len(df) &gt; 0:
if current_day % 15 == 0 and current_day != 0 and Indicator == 0:
for i, j in zip(Capacities, Charge):
for key in Capacities[i]:
Capacities[i][key] = Capacities[i][key] + j
Indicator = 0
select = df[df[&#39;Date of Delivery&#39;] == current_day]
if len(select) == 0:
Daily_Capacities.append(copy.deepcopy(Capacities))        
current_day += 1
continue
if len(select) &gt; 1:
Indicator = 1
select = select[select[&#39;Total Weight&#39;] == select[&#39;Total Weight&#39;].max()]
if len(select) &gt; 1:
select = select[select[&#39;Latest Time to Deliver&#39;] == select[&#39;Latest Time to Deliver&#39;].min()]
if len(select) &gt; 1:
select = select.sample()
Available_Warehouses = []
for key, value in Capacities[select[&#39;Type&#39;].iloc[0]].items():
if select[&#39;Total Weight&#39;].iloc[0] &lt;= value:
Available_Warehouses.append({key:value})
if len(Available_Warehouses) == 0:
df.loc[select.index, &#39;Date of Delivery&#39;] += 1
if Indicator == 1:
continue        
Daily_Capacities.append(copy.deepcopy(Capacities))
current_day += 1
continue
elif len(Available_Warehouses) == 1:
Warehouse = str(*Available_Warehouses[0].keys())
elif len(Available_Warehouses) &gt; 1:
Warehouse = Select_Warhouse(select, Available_Warehouses)
Feasible_Distribution_Plan = Feasible_Distribution_Plan.append(select.iloc[0], ignore_index= True)
df.drop(select.index, inplace= True)
Capacities[select[&#39;Type&#39;].iloc[0]][Warehouse] -= select.iloc[0][&#39;Total Weight&#39;]
if Indicator == 1:
continue
for key in dailycapacities:
Daily_Capacities.append(copy.deepcopy(Capacities))        
current_day += 1             
Feasible_Distribution_Plan[&#39;Date of Delivery&#39;] = Feasible_Distribution_Plan[&#39;Date of Delivery&#39;].astype(&#39;int32&#39;)
Feasible_Distribution_Plan

When I run the program, it faces an error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3801             try:
-&gt; 3802                 return self._engine.get_loc(casted_key)
3803             except KeyError as err:
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
&lt;ipython-input-8-61768d41f965&gt; in &lt;module&gt;
41 
42     elif len(Available_Warehouses) &gt; 1:
---&gt; 43         Warehouse = Select_Warhouse(select, Available_Warehouses)
44 
45     Feasible_Distribution_Plan = Feasible_Distribution_Plan.append(select.iloc[0], ignore_index= True)
&lt;ipython-input-7-54081f8dbf92&gt; in Select_Warhouse(df, li)
13                 # Retrieve column name and value
14                 col_name = matching_col[0]
---&gt; 15                 col_val = df.loc[0, col_name]
16                 results.append((col_name, col_val))
17                 # Update minimum value if necessary
~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1064             key = tuple(com.apply_if_callable(x, self.obj) for x in key)
1065             if self._is_scalar_access(key):
-&gt; 1066                 return self.obj._get_value(*key, takeable=self._takeable)
1067             return self._getitem_tuple(key)
1068         else:
~\anaconda3\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
3922             #  results if our categories are integers that dont match our codes
3923             # IntervalIndex: IntervalTree has no get_loc
-&gt; 3924             row = self.index.get_loc(index)
3925             return series._values[row]
3926 
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3802                 return self._engine.get_loc(casted_key)
3803             except KeyError as err:
-&gt; 3804                 raise KeyError(key) from err
3805             except TypeError:
3806                 # If we have a listlike key, _check_indexing_error will raise
KeyError: 0

I'm kinda new to python, and can not solve the problem. The function is working outside the loop but it is not working in it. Do you have any idea?
ps: If you have any question about the program logic, please feel free to ask.

答案1

得分: 1

当你在循环中使用你的函数时,你尝试访问 df.loc[0, col_name],其中索引 = 0 可能不在循环中传递的数据帧中。为了解决这个问题,你应该将你定义的函数中的这一行代码改为 df.iloc[0][col_name]。另外,在你的代码中,没有 dailycapacities,但你在循环的最后一个 for 中使用了它。这也需要修复。

英文:

When you use your func in loop,you're trying to access df.loc[0, col_name] which the index = 0 might not be in the passing dataframe in the loop.
To solve the problem you should change the line df.loc[0, col_name] in your defined function to df.iloc[0][col_name]
Also, in your code, there's no dailycapacities, but you've used it in the last for of the loop. That one needs to get fixed, too.

huangapple
  • 本文由 发表于 2023年5月30日 00:21:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76358881.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定