在Pandas中的递归函数

huangapple go评论101阅读模式
英文:

Recursive function in pandas

问题

以下是翻译好的部分:

以下是一些情况下代码的预期工作方式。

例如,这是正确的:
```python
sandhi_builder('that this')
{'thaxhis'}

sandhi_builder('this that this')
{'thisthaxhis'}

sandhi_builder('bad boy')
{'baaoy', 'bapoy'}

但这是不正确的:

sandhi_builder('this is bad boy sad boy')
{'thisisbaaoysaaoy', 'thisisbaaoysapoy'}

只返回了2个字符串,但预期是4个:

{'thisisbaaoysaaoy', 'thisisbaaoysapoy', 'thisisbapoysaaoy', 'thisisbapoysapoy'}

代码部分已被省略。


<details>
<summary>英文:</summary>

The code mentioned below is working as expected in some cases.

For e.g. this is correct:

sandhi_builder('that this')
{'thaxhis'}

sandhi_builder('this that this')
{'thisthaxhis'}

sandhi_builder('bad boy')
{'baaoy', 'bapoy'}

But this is not correct:

sandhi_builder('this is bad boy sad boy')
{'thisisbaaoysaaoy', 'thisisbaaoysapoy'}

There are only 2 strings returned, but expected 4:

{'thisisbaaoysaaoy', 'thisisbaaoysapoy',
'thisisbapoysaaoy', 'thisisbapoysapoy'}

The code:

!echo 't t x' > sandhi_code_out.txt
!echo 'e c y' >> sandhi_code_out.txt
!echo 'e m z' >> sandhi_code_out.txt
!echo 'd b a' >> sandhi_code_out.txt
!echo 'd b p' >> sandhi_code_out.txt

import pandas as pd
df = pd.read_csv('sandhi_code_out.txt', delim_whitespace=True, header=None)

df.columns = ['a', 'b','c']

def _sandhi_builder(my):
mylist = list()
for i in my.split():
mylist.append(i)

final = list()
nelist = list()
check = mylist[0] + &#39; &#39; + mylist[1]
for i in [8,7,6,5,4,3,2,1]:
    for p in [0,1,2,3,4,5,6,7]:
        x = mylist[0][-i:]
        y = mylist[1][:p]
        if len(x) &gt; 0 and len(y) &gt; 0:
            try:
                z = df[(df[&#39;a&#39;] == x) &amp; (df[&#39;b&#39;] == y)][&#39;c&#39;]
                if len(z) &gt; 0:
                    for myr in z:
                        myt = [mylist[0][-i:], mylist[1][:p]]
                        final.append(check.replace(&#39; &#39;.join(myt), myr))
            except:
                pass
return set(final)

def sandhi_builder(x):
sandhi_long=[i for i in x.split()]
for k, v in enumerate(sandhi_long):
return_set=_sandhi_builder(sandhi_long[0] + ' ' +sandhi_long[1])
if return_set:
pass
else:
return_set = [sandhi_long[0] + sandhi_long[1]]
for lr in list(range(2, len(sandhi_long))):
tmp_list = list()
if return_set:
for eachv in return_set:
return_set2 = _sandhi_builder(eachv + ' ' +sandhi_long[lr])
if return_set2:
tmp_list = list(return_set2)
else:
tmp_list.append(eachv + sandhi_long[lr])
return_set = set(tmp_list)
return return_set


</details>


# 答案1
**得分**: 1

问题出在` sandhi_builder` 函数中。目前,它只检查输入字符串中前两个单词的组合(`sandhi_long[0]` 和 `sandhi_long[1]`),然后逐个附加其余的单词。这种方法限制了可以生成的组合。

要解决这个问题,你需要修改` sandhi_builder` 函数以遍历输入字符串中所有可能的单词组合。以下是更新后的` sandhi_builder` 函数:

```python
def sandhi_builder(x):
    sandhi_long = x.split()
    return_set = set([sandhi_long[0]])  # 用第一个单词初始化

    for lr in range(1, len(sandhi_long)):
        tmp_list = []
        for eachv in return_set:
            return_set2 = _sandhi_builder(eachv + ' ' + sandhi_long[lr])
            if return_set2:
                tmp_list.extend(return_set2)
            else:
                tmp_list.append(eachv + sandhi_long[lr])
        return_set = set(tmp_list)

    return return_set

这个更新后的版本将遍历输入字符串(sandhi_long)中的每个单词,并通过调用 _sandhi_builder 函数来生成组合。它会在 return_set 中跟踪生成的组合,并在每次迭代时将新的组合附加到 tmp_list 中。最后,它返回包含所有生成的组合的更新后的 return_set

通过这个修复,代码应该能够为给定的输入字符串生成正确数量的组合。

英文:

The issue with the code lies in the sandhi_builder function. Currently, it only checks the combination of the first two words in the input string (sandhi_long[0] and sandhi_long[1]) and then appends the remaining words one by one. This approach limits the combinations that can be generated.

To fix the issue, you need to modify the sandhi_builder function to iterate through all possible combinations of words in the input string. Here's an updated version of the sandhi_builder function:

def sandhi_builder(x):
sandhi_long = x.split()
return_set = set([sandhi_long[0]])  # Initialize with the first word

for lr in range(1, len(sandhi_long)):
    tmp_list = []
    for eachv in return_set:
        return_set2 = _sandhi_builder(eachv + &#39; &#39; + sandhi_long[lr])
        if return_set2:
            tmp_list.extend(return_set2)
        else:
            tmp_list.append(eachv + sandhi_long[lr])
    return_set = set(tmp_list)

return return_set

This updated version will iterate through each word in the input string (sandhi_long) and generate combinations by calling the _sandhi_builder function. It keeps track of the generated combinations in return_set and appends new combinations to tmp_list for each iteration. Finally, it returns the updated return_set containing all the generated combinations.

With this fix, the code should generate the correct number of combinations for the given input string.

huangapple
  • 本文由 发表于 2023年5月29日 11:29:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76354521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定