英文:
Julia: elegant way to identify 2 strings
问题
我想知道是否有一种优雅的方法可以通过Julia识别2个字符串。
我的意思是,有2个字符串,例如
1.我认为这很好
2.这很好,我认为
它们当然有相同的意思,但单词的顺序不同。
我不擅长这种过程。你通常是如何做的呢?你是否将所有单词都放入数组变量中,然后比较每个元素的存在?
我相信在Julia中有一种奇妙的方法。
提前感谢。
英文:
I wonder there is an elegant way to indentify 2 strings by Julia.
I mean,there are 2 strings, for example
1.I think this is good
2.This is good, I think
Both are the same meaning of course, but the words order are different.
I am not good at such like procedure. How do you do it usually? Do you set all words into array variables then compare the each elements existents?
I believe there is a marvelous way in Julia.
Thanks any advance.
答案1
得分: 4
以下是逐步示例:
julia> using StatsBase
julia> strs = ["I think this is good", "This is good, I think"] # 初始字符串向量
2-element Vector{String}:
"I think this is good"
"This is good, I think"
julia> split.(strs, r"\W", keepempty=false) # 通过非单词字符拆分它们
2-element Vector{Vector{SubString{String}}}:
["I", "think", "this", "is", "good"]
["This", "is", "good", "I", "think"]
julia> map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))) # 小写所有单词
2-element Vector{Vector{String}}:
["i", "think", "this", "is", "good"]
["this", "is", "good", "i", "think"]
julia> sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false)))) # 对每个条目排序
2-element Vector{Vector{String}}:
["good", "i", "is", "think", "this"]
["good", "i", "is", "think", "this"]
julia> countmap(sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))))) # 最后计算重复次数,需要使用StatsBase
Dict{Vector{String}, Int64} with 1 entry:
["good", "i", "is", "think", "this"] => 2
英文:
Here is a step by step example:
julia> using StatsBase
julia> strs = ["I think this is good", "This is good, I think"] # initial vector of strings
2-element Vector{String}:
"I think this is good"
"This is good, I think"
julia> split.(strs, r"\W", keepempty=false) # split them by non-word characters
2-element Vector{Vector{SubString{String}}}:
["I", "think", "this", "is", "good"]
["This", "is", "good", "I", "think"]
julia> map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))) # lowercase all words
2-element Vector{Vector{String}}:
["i", "think", "this", "is", "good"]
["this", "is", "good", "i", "think"]
julia> sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false)))) # sort each entry
2-element Vector{Vector{String}}:
["good", "i", "is", "think", "this"]
["good", "i", "is", "think", "this"]
julia> countmap(sort.(map(x -> lowercase.(x), (split.(strs, r"\W", keepempty=false))))) # finally count the number of duplicates, you need StatsBaes for this
Dict{Vector{String}, Int64} with 1 entry:
["good", "i", "is", "think", "this"] => 2
答案2
得分: -1
string1 = "我认为这是好的"
string2 = "这是好的,我认为"
//第1步(去除空格)
string1 = strip(string1)
string2 = strip(string2)
//第2步(将字符串拆分为单词,为每个字符串创建一个单词向量)
words1 = split(string1)
words2 = split(string2)
//第3步(将单词向量按字母顺序排序)
sorted_words1 = sort(words1)
sorted_words2 = sort(words2)
//第4步
if sorted_words1 == sorted_words2
println("相同")
else
println("不同")
end
英文:
string1 = "I think this is good"
string2 = "This is good, I think"
//Step 1 (remove blank spaces)
string1 = strip(string1)
string2 = strip(string2)
//Step 2 (splitting the strings into individual words, creating a vector of words for each string)
words1 = split(string1)
words2 = split(string2)
//Step 3 (sort the word vectors in alphabetical order)
sorted_words1 = sort(words1)
sorted_words2 = sort(words2)
//Step 4
if sorted_words1 == sorted_words2
println("Same")
else
println("Different")
end
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论