英文:
Recursive function in pandas
问题
以下是翻译好的部分:
以下是一些情况下代码的预期工作方式。
例如,这是正确的:
```python
sandhi_builder('that this')
{'thaxhis'}
sandhi_builder('this that this')
{'thisthaxhis'}
sandhi_builder('bad boy')
{'baaoy', 'bapoy'}
但这是不正确的:
sandhi_builder('this is bad boy sad boy')
{'thisisbaaoysaaoy', 'thisisbaaoysapoy'}
只返回了2个字符串,但预期是4个:
{'thisisbaaoysaaoy', 'thisisbaaoysapoy', 'thisisbapoysaaoy', 'thisisbapoysapoy'}
代码部分已被省略。
<details>
<summary>英文:</summary>
The code mentioned below is working as expected in some cases.
For e.g. this is correct:
sandhi_builder('that this')
{'thaxhis'}
sandhi_builder('this that this')
{'thisthaxhis'}
sandhi_builder('bad boy')
{'baaoy', 'bapoy'}
But this is not correct:
sandhi_builder('this is bad boy sad boy')
{'thisisbaaoysaaoy', 'thisisbaaoysapoy'}
There are only 2 strings returned, but expected 4:
{'thisisbaaoysaaoy', 'thisisbaaoysapoy',
'thisisbapoysaaoy', 'thisisbapoysapoy'}
The code:
!echo 't t x' > sandhi_code_out.txt
!echo 'e c y' >> sandhi_code_out.txt
!echo 'e m z' >> sandhi_code_out.txt
!echo 'd b a' >> sandhi_code_out.txt
!echo 'd b p' >> sandhi_code_out.txt
import pandas as pd
df = pd.read_csv('sandhi_code_out.txt', delim_whitespace=True, header=None)
df.columns = ['a', 'b','c']
def _sandhi_builder(my):
mylist = list()
for i in my.split():
mylist.append(i)
final = list()
nelist = list()
check = mylist[0] + ' ' + mylist[1]
for i in [8,7,6,5,4,3,2,1]:
for p in [0,1,2,3,4,5,6,7]:
x = mylist[0][-i:]
y = mylist[1][:p]
if len(x) > 0 and len(y) > 0:
try:
z = df[(df['a'] == x) & (df['b'] == y)]['c']
if len(z) > 0:
for myr in z:
myt = [mylist[0][-i:], mylist[1][:p]]
final.append(check.replace(' '.join(myt), myr))
except:
pass
return set(final)
def sandhi_builder(x):
sandhi_long=[i for i in x.split()]
for k, v in enumerate(sandhi_long):
return_set=_sandhi_builder(sandhi_long[0] + ' ' +sandhi_long[1])
if return_set:
pass
else:
return_set = [sandhi_long[0] + sandhi_long[1]]
for lr in list(range(2, len(sandhi_long))):
tmp_list = list()
if return_set:
for eachv in return_set:
return_set2 = _sandhi_builder(eachv + ' ' +sandhi_long[lr])
if return_set2:
tmp_list = list(return_set2)
else:
tmp_list.append(eachv + sandhi_long[lr])
return_set = set(tmp_list)
return return_set
</details>
# 答案1
**得分**: 1
问题出在` sandhi_builder` 函数中。目前,它只检查输入字符串中前两个单词的组合(`sandhi_long[0]` 和 `sandhi_long[1]`),然后逐个附加其余的单词。这种方法限制了可以生成的组合。
要解决这个问题,你需要修改` sandhi_builder` 函数以遍历输入字符串中所有可能的单词组合。以下是更新后的` sandhi_builder` 函数:
```python
def sandhi_builder(x):
sandhi_long = x.split()
return_set = set([sandhi_long[0]]) # 用第一个单词初始化
for lr in range(1, len(sandhi_long)):
tmp_list = []
for eachv in return_set:
return_set2 = _sandhi_builder(eachv + ' ' + sandhi_long[lr])
if return_set2:
tmp_list.extend(return_set2)
else:
tmp_list.append(eachv + sandhi_long[lr])
return_set = set(tmp_list)
return return_set
这个更新后的版本将遍历输入字符串(sandhi_long
)中的每个单词,并通过调用 _sandhi_builder
函数来生成组合。它会在 return_set
中跟踪生成的组合,并在每次迭代时将新的组合附加到 tmp_list
中。最后,它返回包含所有生成的组合的更新后的 return_set
。
通过这个修复,代码应该能够为给定的输入字符串生成正确数量的组合。
英文:
The issue with the code lies in the sandhi_builder function. Currently, it only checks the combination of the first two words in the input string (sandhi_long[0] and sandhi_long[1]) and then appends the remaining words one by one. This approach limits the combinations that can be generated.
To fix the issue, you need to modify the sandhi_builder function to iterate through all possible combinations of words in the input string. Here's an updated version of the sandhi_builder function:
def sandhi_builder(x):
sandhi_long = x.split()
return_set = set([sandhi_long[0]]) # Initialize with the first word
for lr in range(1, len(sandhi_long)):
tmp_list = []
for eachv in return_set:
return_set2 = _sandhi_builder(eachv + ' ' + sandhi_long[lr])
if return_set2:
tmp_list.extend(return_set2)
else:
tmp_list.append(eachv + sandhi_long[lr])
return_set = set(tmp_list)
return return_set
This updated version will iterate through each word in the input string (sandhi_long) and generate combinations by calling the _sandhi_builder function. It keeps track of the generated combinations in return_set and appends new combinations to tmp_list for each iteration. Finally, it returns the updated return_set containing all the generated combinations.
With this fix, the code should generate the correct number of combinations for the given input string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论