尝试找到在R中使用adist()来处理单词而不是字符的方法。

huangapple go评论84阅读模式
英文:

Trying to find a way to use adist() for words instead of characters in R

问题

我希望adist函数在处理字符时与处理单词时的工作方式相同。我的意思是,我希望删除/替换/插入操作应用于整个单词而不是字符。例如,我希望“Alert 12 went off at 3am”和“Alert 17 was heard at 3am”之间的Levenshtein距离为3,因为需要进行三次单词替换才能从一个字符串转换为另一个字符串。谢谢。

英文:

I'd like for the adist function to work the same way it does for words as it does for characters. What I mean by this is I'd like a deletion/substitution/insertion to apply to a whole word instead of characters. For example, I want "Alert 12 went off at 3am" and "Alert 17 was heard at 3am" to have a Levenshtein Distance of 3 because there are three substitutions of words needed to get from one string to another. Thanks

答案1

得分: 0

我猜你可以尝试以下代码来统计不同单词的数量

library(vecsets)
d <- length(vsetdiff(unlist(strsplit(s1," ")),unlist(strsplit(s2," "))))

这样

> d
[1] 3

数据

s1 <- "Alert 12 went off at 3am"
s2 <- "Alert 17 was heard at 3am"
英文:

I guess you can try the following code to count different words

library(vecsets)
d &lt;- length(vsetdiff(unlist(strsplit(s1,&quot; &quot;)),unlist(strsplit(s2,&quot; &quot;))))

such that

&gt; d
[1] 3

DATa

s1 &lt;- &quot;Alert 12 went off at 3am&quot;
s2 &lt;- &quot;Alert 17 was heard at 3am&quot;

huangapple
  • 本文由 发表于 2020年1月3日 20:45:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578888.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定