需要关于在操作字典的函数中处理缺失键的更好方法的建议。

huangapple go评论79阅读模式
英文:

Need suggestions for a better approach to handle missing keys in a function operating on dictionaries

问题

问题是我正在寻找解决任务的最佳实践。
我有下面描述的函数(用于解释问题的简单示例)

def create_additional_keys(data: dict):
  data['l_t_j'] = 1 if data['a'] in [27, 11, 33] else 0
  data['b_n_j'] = 1 if data['b'] in [29, 1, 27] else 0
  data['p_t_q'] = 'ck' if data['c'] == '' else data['c']
  data['m_k_z'] = 'd12' if data['d'] in ['d1', 'd2'] else 'other'
  data['y_s_n'] = data['e1'] * data['e2'] * data['e3']
  data['h_g_p'] = np.log(data['f'])
  ...
  data['s_t_x'] = 1 if data['g'] < 0 else data['g']
  data['c_e_m'] = 1 if data['i'] in [97, 26, 57] else 2 if data['i'] in [98, 27, 58] else 3
  data['s_o_j'] = 1 if data['j'] in [82, 38, 60] else 0
  data['k_s_a'] = data['h'] // 4

问题是,当我使用这个函数时,我总是需要确保我的字典包含所有的键,但这并不总是方便的。
我经常有我需要的大部分键,但有时候没有。
有什么最佳实践可以使函数独立于是否有这些键?

目前,我有几种实现方式,但我不太喜欢它们,希望找到最好的方式。

  1. 将每段代码包装在try-except中(正如我所说,通常情况下,字典中有大多数的键)
    例如:

    try:
      data['l_t_j'] = 1 if data['a'] in [27, 11, 33] else 0
    except KeyError:
      pass
    
  2. 在格式化新键之前,首先检查字典中是否存在所需的键。
    例如:

    if 'a' in data:
      data['l_t_j'] = 1 if data['a'] in [27, 11, 33] else 0
    
  3. 将负责创建新键的代码行移到单独的函数中,并使用带有try-except结构的循环来遍历它们
    例如:

    formation_l_t_j = lambda data: {"l_t_j": 1 if data["a"] in [27, 11, 33] else 0}
    ...
    formation_k_s_a = lambda data: {"k_s_a": data["h"] // 4}
    for function in [formation_l_t_j, ..., formation_k_s_a]:
      try:
        data.update(function(data))
      except KeyError:
        pass
    
英文:

The question is that I am seeking the best practice to solve my task.
I have the function that I described below (simple example for the explanation of the problem)

def create_additional_keys(data: dict):
  data[&#39;l_t_j&#39;] = 1 if data[&#39;a&#39;] in [27, 11, 33] else 0
  data[&#39;b_n_j&#39;] = 1 if data[&#39;b&#39;] in [29, 1, 27] else 0
  data[&#39;p_t_q&#39;] = &#39;ck&#39; if data[&#39;c&#39;] == &#39;&#39; else data[&#39;c&#39;]
  data[&#39;m_k_z&#39;] = &#39;d12&#39; if data[&#39;d&#39;] in [&#39;d1&#39;, &#39;d2&#39;] else &#39;other&#39;
  data[&#39;y_s_n&#39;] = data[&#39;e1&#39;] * data[&#39;e2&#39;] * data[&#39;e3&#39;]
  data[&#39;h_g_p&#39;] = np.log(data[&#39;f&#39;])
  ...
  data[&#39;s_t_x&#39;] = 1 if data[&#39;g&#39;] &lt; 0 else data[&#39;g&#39;]
  data[&#39;c_e_m&#39;] = 1 if data[&#39;i&#39;] in [97, 26, 57] else 2 if data[&#39;i&#39;] in [98, 27, 58] else 3
  data[&#39;s_o_j&#39;] = 1 if data[&#39;j&#39;] in [82, 38, 60] else 0
  data[&#39;k_s_a&#39;] = data[&#39;h&#39;] // 4

The problem is that when I use this function, I always need to ensure that my dictionary has all the keys, but it is not always comfortable.
I often have the majority of keys that I need, but sometimes I do not.
What is the best practice to make a function independent of whether I have these keys or not?

At this point, I have several variants of realization, but I don't like them much and want to realize the best variant.

  1. To wrap each code in try-except (as I said, most often, the dictionary has a majority of keys)
    ex.:
try:
  data[&#39;l_t_j&#39;] = 1 if data[&#39;a&#39;] in [27, 11, 33] else 0
except KeyError:
  pass
  1. Before formatting a new key, check first if the key that is needed exists in the dictionary.
    ex.:
if &#39;a&#39; in data:
  data[&#39;l_t_j&#39;] = 1 if data[&#39;a&#39;] in [27, 11, 33] else 0
  1. To move the lines of code responsible for creating a new key to separate functions and use a loop with try-except structure to iterate through them
    ex.:
formation_l_t_j = lambda data: {&quot;l_t_j&quot;: 1 if data[&quot;a&quot;] in [27, 11, 33] else 0}
...
formation_k_s_a = lambda data: {&quot;k_s_a&quot;: data[&quot;h&quot;] // 4}
for function in [formation_l_t_j, ..., formation_k_s_a]:
  try:
    data.update(function(data))
  except KeyError:
    pass

答案1

得分: 1

我审查了您的最后提议,并以我认为更清晰的方式实现它:

data = {'a': 27, 'b': 11}

l = (
    ('l_t_j', lambda: data['a'] in [27, 11, 33]),
    ('b_n_j', lambda: data['b'] in [29, 1, 27]),
    ('p_t_q', lambda: 'ck' if data['c'] == '' else data['c']),
)

for key, compute_value in l:
    try:
        value = compute_value()
    except KeyError:
        continue
        
    data[key] = value

print(data)
# {'a': 27, 'b': 11, 'l_t_j': True, 'b_n_j': False}

通常不建议“命名”一个lambda函数,我们通常会使用标准的函数定义代替。

https://peps.python.org/pep-0008/#programming-recommendations

始终使用def语句而不是将lambda表达式直接绑定到标识符的赋值语句:

# 正确的方式:
def f(x): return 2*x
# 错误的方式:
f = lambda x: 2*x

第一种形式意味着结果函数对象的名称是特定的'f',而不是通用的''。这在回溯和一般字符串表示方面更有用。赋值语句的使用消除了lambda表达式在明确的def语句之上所提供的唯一好处(即它可以嵌入在较大的表达式中)。

英文:

I reviewed your last proposition and implement it in a way that I found cleaner:

data = {&#39;a&#39;: 27, &#39;b&#39;: 11}

l = (
    (&#39;l_t_j&#39;, lambda: data[&#39;a&#39;] in [27, 11, 33]),
    (&#39;b_n_j&#39;, lambda: data[&#39;b&#39;] in [29, 1, 27]),
    (&#39;p_t_q&#39;, lambda: &#39;ck&#39; if data[&#39;c&#39;] == &#39;&#39; else data[&#39;c&#39;]),
)

for key, compute_value in l:
    try:
        value = compute_value()
    except KeyError:
        continue
        
    data[key] = value

print(data)
# {&#39;a&#39;: 27, &#39;b&#39;: 11, &#39;l_t_j&#39;: True, &#39;b_n_j&#39;: False}

It's usually not recommended to "name" a lambda, we usually do a standard function definition instead.

https://peps.python.org/pep-0008/#programming-recommendations

> Always use a def statement instead of an assignment statement that
> binds a lambda expression directly to an identifier:
> python
&gt; # Correct:
&gt; def f(x): return 2*x
&gt;

> python
&gt; # Wrong:
&gt; f = lambda x: 2*x
&gt;

> The first form means that the name of the resulting function object is
> specifically ‘f’ instead of the generic ‘<lambda>’. This is more
> useful for tracebacks and string representations in general. The use
> of the assignment statement eliminates the sole benefit a lambda
> expression can offer over an explicit def statement (i.e. that it can
> be embedded inside a larger expression)

答案2

得分: 0

get() 方法是处理缺失键的典型方式。如果键不存在,它将返回值或 None。还有一个可选参数,用于在键不存在时使用。

要获取 1(如果存在)或 0(如果不存在或没有键):

data['l_t_j'] = 1 if data['a'] in [27, 11, 33] else 0

# 变成

data['l_t_j'] = int(data.get('a') in [27, 11, 33])

使用 or 来替换空/None字符串:

data['p_t_q'] = 'ck' if data['c'] == '' else data['c']

# 变成

data['p_t_q'] = data.get('c') or 'ck'

选择两个值之间的值:

data['m_k_z'] = 'd12' if data['d'] in ['d1', 'd2'] else 'other'

# 变成:

data['m_k_z'] = 'd12' if data.get('d') in ['d1', 'd2'] else 'other'

# 或者这样:

data['m_k_z'] = 'd12' * (data.get('d') in ['d1', 'd2']) or 'other'

对于数值计算使用默认参数:

data['y_s_n'] = data['e1'] * data['e2'] * data['e3']

# 变成

data['y_s_n'] = data.get('e1', 0) * data.get('e2', 0) * data.get('e3', 0)

函数调用:

data['h_g_p'] = np.log(data['f'])

# 变成

data['h_g_p'] = np.log(data.get('f', 1))  # 假设你希望键不存在时返回零,否则...

data['h_g_p'] = data.get('f') and np.log(data['f'])  # 如果不存在返回 None

在键缺失时选择条件结果的默认值:

data['s_t_x'] = 1 if data['g'] < 0 else data['g']

# 变成

data['s_t_x'] = 1 if data.get('g', -1) < 0 else data['g']

多个条件:

data['c_e_m'] = 1 if data['i'] in [97, 26, 57] else 2 if data['i'] in [98, 27, 58] else 3

# 变成

data['c_e_m'] = {97: 1, 26: 1, 57: 1, 98: 2, 27: 2, 58: 2}.get(data.get('i'), 3)
英文:

The get() method is the typical way to handle missing keys. If will return the value or None if the key is absent. There is also an optional parameter to use when the key is not there.

To get 1 if present 0 if not or no key

data[&#39;l_t_j&#39;] = 1 if data[&#39;a&#39;] in [27, 11, 33] else 0

# becomes

data[&#39;l_t_j&#39;] = int( data.get(&#39;a&#39;) in [27, 11, 33] )  

Using or to replace empty/None strings:

data[&#39;p_t_q&#39;] = &#39;ck&#39; if data[&#39;c&#39;] == &#39;&#39; else data[&#39;c&#39;]

# becomes

data[&#39;p_t_q&#39;] =  data.get(&#39;c&#39;) or &#39;ck&#39;

To select between two values:

data[&#39;m_k_z&#39;] = &#39;d12&#39; if data[&#39;d&#39;] in [&#39;d1&#39;, &#39;d2&#39;] else &#39;other&#39;

# becomes:

data[&#39;m_k_z&#39;] = &#39;d12&#39; if data.get(&#39;d&#39;) in [&#39;d1&#39;, &#39;d2&#39;] else &#39;other&#39;

# or this way:

data[&#39;m_k_z&#39;] =  &#39;d12&#39; * (data.get(&#39;d&#39;) in [&#39;d1&#39;, &#39;d2&#39;]) or &#39;other&#39;

Using default parameter for numerical calculations:

data[&#39;y_s_n&#39;] = data[&#39;e1&#39;] * data[&#39;e2&#39;] * data[&#39;e3&#39;]

# becomes

data[&#39;y_s_n&#39;] = data,get(&#39;e1&#39;,0) * data.get(&#39;e2&#39;,0) * data.get(&#39;e3&#39;,0)

Function calls:

data[&#39;h_g_p&#39;] = np.log(data[&#39;f&#39;])

# becomes 

data[&#39;h_g_p&#39;] = np.log(data.get(&#39;f&#39;,1)) 
# assuming you want zero when key is absent, otherwise ...

data[&#39;h_g_p&#39;] = data.get(f) and np.log(data[&#39;f&#39;]) # None if ansent

default value to chose condition result when key is missing:

data[&#39;s_t_x&#39;] = 1 if data[&#39;g&#39;] &lt; 0 else data[&#39;g&#39;]

# becomes

data[&#39;s_t_x&#39;] = 1 if data.get(&#39;g&#39;,-1) &lt; 0 else data[&#39;g&#39;]

Multiple conditions:

data[&#39;c_e_m&#39;] = 1 if data[&#39;i&#39;] in [97, 26, 57] else 2 if data[&#39;i&#39;] in [98, 27, 58] else 3

#becomes

data[&#39;c_e_m&#39;] = {97:1, 26:1, 57:1, 98:2, 27:2, 58:2}.get(data.get(&#39;i&#39;),3)

huangapple
  • 本文由 发表于 2023年7月12日 22:53:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671904.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定