Julia:识别两个字符串的优雅方式

huangapple go评论57阅读模式
英文:

Julia: elegant way to identify 2 strings

问题

我想知道是否有一种优雅的方法可以通过Julia识别2个字符串。
我的意思是,有2个字符串,例如

   1.我认为这很好
   2.这很好,我认为

它们当然有相同的意思,但单词的顺序不同。
我不擅长这种过程。你通常是如何做的呢?你是否将所有单词都放入数组变量中,然后比较每个元素的存在?
我相信在Julia中有一种奇妙的方法。

提前感谢。

英文:

I wonder there is an elegant way to indentify 2 strings by Julia.
I mean,there are 2 strings, for example

   1.I think this is good
   2.This is good, I think

Both are the same meaning of course, but the words order are different.
I am not good at such like procedure. How do you do it usually? Do you set all words into array variables then compare the each elements existents?
I believe there is a marvelous way in Julia.

Thanks any advance.

答案1

得分: 4

以下是逐步示例:

julia> using StatsBase

julia> strs = ["I think this is good", "This is good, I think"] # 初始字符串向量
2-element Vector{String}:
 "I think this is good"
 "This is good, I think"

julia> split.(strs, r"\W", keepempty=false) # 通过非单词字符拆分它们
2-element Vector{Vector{SubString{String}}}:
 ["I", "think", "this", "is", "good"]
 ["This", "is", "good", "I", "think"]

julia> map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))) # 小写所有单词
2-element Vector{Vector{String}}:
 ["i", "think", "this", "is", "good"]
 ["this", "is", "good", "i", "think"]

julia> sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false)))) # 对每个条目排序
2-element Vector{Vector{String}}:
 ["good", "i", "is", "think", "this"]
 ["good", "i", "is", "think", "this"]

julia> countmap(sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))))) # 最后计算重复次数,需要使用StatsBase
Dict{Vector{String}, Int64} with 1 entry:
  ["good", "i", "is", "think", "this"] => 2
英文:

Here is a step by step example:

julia> using StatsBase

julia> strs = ["I think this is good", "This is good, I think"] # initial vector of strings
2-element Vector{String}:
 "I think this is good"
 "This is good, I think"

julia> split.(strs, r"\W", keepempty=false) # split them by non-word characters
2-element Vector{Vector{SubString{String}}}:
 ["I", "think", "this", "is", "good"]
 ["This", "is", "good", "I", "think"]

julia> map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))) # lowercase all words
2-element Vector{Vector{String}}:
 ["i", "think", "this", "is", "good"]
 ["this", "is", "good", "i", "think"]

julia> sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false)))) # sort each entry
2-element Vector{Vector{String}}:
 ["good", "i", "is", "think", "this"]
 ["good", "i", "is", "think", "this"]

julia> countmap(sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))))) # finally count the number of duplicates, you need StatsBaes for this
Dict{Vector{String}, Int64} with 1 entry:
  ["good", "i", "is", "think", "this"] => 2

答案2

得分: -1

string1 = "我认为这是好的"
string2 = "这是好的,我认为"

//第1步(去除空格)
string1 = strip(string1)
string2 = strip(string2)

//第2步(将字符串拆分为单词,为每个字符串创建一个单词向量)
words1 = split(string1)
words2 = split(string2)

//第3步(将单词向量按字母顺序排序)
sorted_words1 = sort(words1)
sorted_words2 = sort(words2)

//第4步
if sorted_words1 == sorted_words2
println("相同")
else
println("不同")
end

英文:
string1 = "I think this is good"
string2 = "This is good, I think"

//Step 1 (remove blank spaces)
string1 = strip(string1)
string2 = strip(string2)

//Step 2 (splitting the strings into individual words, creating a vector of words for each string)
words1 = split(string1)
words2 = split(string2)

//Step 3 (sort the word vectors in alphabetical order)
sorted_words1 = sort(words1)
sorted_words2 = sort(words2)

//Step 4
if sorted_words1 == sorted_words2
    println("Same")
else
    println("Different")
end

huangapple
  • 本文由 发表于 2023年6月1日 15:29:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定