在Pandas中的递归函数

huangapple go评论114阅读模式
英文:

Recursive function in pandas

问题

以下是翻译好的部分:

  1. 以下是一些情况下代码的预期工作方式。
  2. 例如,这是正确的:
  3. ```python
  4. sandhi_builder('that this')
  5. {'thaxhis'}
  6. sandhi_builder('this that this')
  7. {'thisthaxhis'}
  8. sandhi_builder('bad boy')
  9. {'baaoy', 'bapoy'}

但这是不正确的:

  1. sandhi_builder('this is bad boy sad boy')
  2. {'thisisbaaoysaaoy', 'thisisbaaoysapoy'}

只返回了2个字符串,但预期是4个:

  1. {'thisisbaaoysaaoy', 'thisisbaaoysapoy', 'thisisbapoysaaoy', 'thisisbapoysapoy'}

代码部分已被省略。

  1. <details>
  2. <summary>英文:</summary>
  3. The code mentioned below is working as expected in some cases.
  4. For e.g. this is correct:

sandhi_builder('that this')
{'thaxhis'}

sandhi_builder('this that this')
{'thisthaxhis'}

sandhi_builder('bad boy')
{'baaoy', 'bapoy'}

  1. But this is not correct:

sandhi_builder('this is bad boy sad boy')
{'thisisbaaoysaaoy', 'thisisbaaoysapoy'}

  1. There are only 2 strings returned, but expected 4:

{'thisisbaaoysaaoy', 'thisisbaaoysapoy',
'thisisbapoysaaoy', 'thisisbapoysapoy'}

  1. The code:

!echo 't t x' > sandhi_code_out.txt
!echo 'e c y' >> sandhi_code_out.txt
!echo 'e m z' >> sandhi_code_out.txt
!echo 'd b a' >> sandhi_code_out.txt
!echo 'd b p' >> sandhi_code_out.txt

import pandas as pd
df = pd.read_csv('sandhi_code_out.txt', delim_whitespace=True, header=None)

df.columns = ['a', 'b','c']

def _sandhi_builder(my):
mylist = list()
for i in my.split():
mylist.append(i)

  1. final = list()
  2. nelist = list()
  3. check = mylist[0] + &#39; &#39; + mylist[1]
  4. for i in [8,7,6,5,4,3,2,1]:
  5. for p in [0,1,2,3,4,5,6,7]:
  6. x = mylist[0][-i:]
  7. y = mylist[1][:p]
  8. if len(x) &gt; 0 and len(y) &gt; 0:
  9. try:
  10. z = df[(df[&#39;a&#39;] == x) &amp; (df[&#39;b&#39;] == y)][&#39;c&#39;]
  11. if len(z) &gt; 0:
  12. for myr in z:
  13. myt = [mylist[0][-i:], mylist[1][:p]]
  14. final.append(check.replace(&#39; &#39;.join(myt), myr))
  15. except:
  16. pass
  17. return set(final)

def sandhi_builder(x):
sandhi_long=[i for i in x.split()]
for k, v in enumerate(sandhi_long):
return_set=_sandhi_builder(sandhi_long[0] + ' ' +sandhi_long[1])
if return_set:
pass
else:
return_set = [sandhi_long[0] + sandhi_long[1]]
for lr in list(range(2, len(sandhi_long))):
tmp_list = list()
if return_set:
for eachv in return_set:
return_set2 = _sandhi_builder(eachv + ' ' +sandhi_long[lr])
if return_set2:
tmp_list = list(return_set2)
else:
tmp_list.append(eachv + sandhi_long[lr])
return_set = set(tmp_list)
return return_set

  1. </details>
  2. # 答案1
  3. **得分**: 1
  4. 问题出在` sandhi_builder` 函数中。目前,它只检查输入字符串中前两个单词的组合(`sandhi_long[0]` 和 `sandhi_long[1]`),然后逐个附加其余的单词。这种方法限制了可以生成的组合。
  5. 要解决这个问题,你需要修改` sandhi_builder` 函数以遍历输入字符串中所有可能的单词组合。以下是更新后的` sandhi_builder` 函数:
  6. ```python
  7. def sandhi_builder(x):
  8. sandhi_long = x.split()
  9. return_set = set([sandhi_long[0]]) # 用第一个单词初始化
  10. for lr in range(1, len(sandhi_long)):
  11. tmp_list = []
  12. for eachv in return_set:
  13. return_set2 = _sandhi_builder(eachv + ' ' + sandhi_long[lr])
  14. if return_set2:
  15. tmp_list.extend(return_set2)
  16. else:
  17. tmp_list.append(eachv + sandhi_long[lr])
  18. return_set = set(tmp_list)
  19. return return_set

这个更新后的版本将遍历输入字符串(sandhi_long)中的每个单词,并通过调用 _sandhi_builder 函数来生成组合。它会在 return_set 中跟踪生成的组合,并在每次迭代时将新的组合附加到 tmp_list 中。最后,它返回包含所有生成的组合的更新后的 return_set

通过这个修复,代码应该能够为给定的输入字符串生成正确数量的组合。

英文:

The issue with the code lies in the sandhi_builder function. Currently, it only checks the combination of the first two words in the input string (sandhi_long[0] and sandhi_long[1]) and then appends the remaining words one by one. This approach limits the combinations that can be generated.

To fix the issue, you need to modify the sandhi_builder function to iterate through all possible combinations of words in the input string. Here's an updated version of the sandhi_builder function:

  1. def sandhi_builder(x):
  2. sandhi_long = x.split()
  3. return_set = set([sandhi_long[0]]) # Initialize with the first word
  4. for lr in range(1, len(sandhi_long)):
  5. tmp_list = []
  6. for eachv in return_set:
  7. return_set2 = _sandhi_builder(eachv + &#39; &#39; + sandhi_long[lr])
  8. if return_set2:
  9. tmp_list.extend(return_set2)
  10. else:
  11. tmp_list.append(eachv + sandhi_long[lr])
  12. return_set = set(tmp_list)
  13. return return_set

This updated version will iterate through each word in the input string (sandhi_long) and generate combinations by calling the _sandhi_builder function. It keeps track of the generated combinations in return_set and appends new combinations to tmp_list for each iteration. Finally, it returns the updated return_set containing all the generated combinations.

With this fix, the code should generate the correct number of combinations for the given input string.

huangapple
  • 本文由 发表于 2023年5月29日 11:29:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76354521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定