英文:
SQL Case Statement multiple matches
问题
以下是你提供的内容的翻译:
I am trying to make a case statement in SQL for grouping however I am running into an issue where its grouping incorrectly if there's multiple matches.
我正在尝试在SQL中创建一个用于分组的CASE语句,但我遇到了一个问题,如果有多个匹配项,它会错误地进行分组。
Is there a way so it searches word by word as opposed to whole thing and gives results by which word shows up first?
是否有一种方法可以按单词逐个搜索,而不是整个内容,并根据哪个单词首先出现来返回结果?
Subject Examples:
主题示例:
the animal rode in the car --- EXPECTED RESULT Animal
the animal rode in the car --- 期望结果 Animal
in the car the animal rode --- EXPECTED RESULT CAR... actual result Animal
in the car the animal rode --- 期望结果 CAR... 实际结果 Animal
CASE
WHEN Subject like '%animal%' then 'Animal'
WHEN Subject like '%car%' then 'Car'
END as Category
CASE
WHEN Subject like '%animal%' then 'Animal'
WHEN Subject like '%car%' then 'Car'
END as Category
英文:
I am trying to make a case statement in SQL for grouping however I am running into an issue where its grouping incorrectly if there's multiple matches.
Is there a way so it searches word by word as opposed to whole thing and gives results by which word shows up first?
Subject Examples:
the animal rode in the car --- EXPECTED RESULT Animal
in the car the animal rode --- EXPECTED RESULT CAR... actual result Animal
CASE
WHEN Subject like '%animal%' then 'Animal'
WHEN Subject like '%car%' then 'Car'
END as Category
答案1
得分: 1
使用CASE语句的问题在于整个测试是按顺序执行的。您没有测试哪个先出现。
这可能看起来复杂,但我尝试让它对您来说易于实现。
首先让我们看一下最内部的连接:
- 第一个FROM子句查看您要测试的项。这可以转化为SELECT语句。
- 接下来是您要查找的值。
- 找到位置。
接下来的最内部查询:
- 基本上添加了一个行编号,以按最小位置(先出现的位置)对记录进行排序。
- 还删除了没有找到匹配项的测试(基本上是位置为0的测试)。
最后,最外部的查询:
- 整理您的数据并仅选择首先出现的位置。
结果:
in the car the animal rode CAR 8 1
the animal rode in the car Animal 5 1
英文:
The problem with using a CASE statement is that the entire test is performed in order. You aren't testing which comes first.
This might look complex but I tried to make it easy for implementation for you.
select *
from (
select test,Category=vals,position, rn = row_number() over (partition by test order by position)
from(
select *
from (values('the animal rode in the car'),('in the car the animal rode')) as tests(test)
cross apply (values('CAR'),('Animal')) as Lookfor(vals)
cross apply (select CHARINDEX(Lookfor.vals,tests.test)) as p(position)
) a
where position>0
) rn
where rn.rn=1
First let's look at the innermost joins:
- The first from looks at items you want to test. This can be translated into a SELECT.
- Next are the values you want to look up
- Finds the position
The next innermost query:
- This basically add a row_runmber to sort records by the smallest position (which came first)
- Also removed and tests that did not find a match (basically the position of 0).
Finally, the outermost query:
- Organizes your data and selects only the position that came first.
Results:
in the car the animal rode CAR 8 1
the animal rode in the car Animal 5 1
答案2
得分: 0
与其他答案类似,您可以使用charindex
来确定任何匹配项的索引并取第一个出现的匹配项。当然,这仅适用于您始终执行like '%something%'
比较的情况;如果您的模式更复杂(例如like '%this%and%that%'
),charindex
将不会有太多帮助。
这种方法将您的规则放入一个名为MatchLogic
的表中,允许您轻松构建规则列表而不必修改代码本身。
declare @testdata table (test nvarchar(64))
insert @testdata (test)
values ('the animal rode in the car')
, ('in the car the animal rode')
, ('the dog rode in the the van')
declare @matchLogic table (partialString nvarchar(32), category nvarchar(32))
insert @matchLogic
values ('animal', 'Animal')
, ('car', 'Car')
select test, (
select top 1 category
from @matchLogic
where test like '%' + partialString + '%'
order by charindex(partialString, test) asc
) category
from @testData t
另一种方法是从您的字符串匹配1个字符,然后是2个字符,然后是3个字符,依此类推,直到获得第一个匹配项。
注意:
- 这非常低效,不适用于大型数据集。
- 这获取第一个完整匹配项;而不是第一个字符匹配的匹配项;例如,查看
monkey ... animal ... bicycle
示例,其中animal
是第一个完整匹配项,即使monkey
出现在animal
之前,因为只有在获得完整字符串后,monkey ... bicycle
才会匹配一次。
declare @testdata table (id bigint identity(1,1),test nvarchar(64))
insert @testdata (test)
values ('the animal rode in the car')
, ('in the car the animal rode')
, ('the monkey rode the bicycle')
, ('the monkey (an animal) rode the bicycle')
, ('the dog rode in the the van')
declare @matchLogic table (pattern nvarchar(32), category nvarchar(32))
insert @matchLogic
values ('%animal%', 'Animal')
, ('%car%', 'Car')
, ('%monkey%bicycle%', 'MonkeyBike')
; with substrings as (
select id, test part, len(test) chars
from @testdata
union all
select id, substring(part, 1, chars-1), chars-1
from substrings s
where chars > 1
)
select t.id, t.test, x.category
from @testData t
outer apply (
select top 1 category
from @matchLogic m
inner join substrings s
on part like m.pattern
where s.id = t.id
order by s.chars asc
) x
英文:
Similar to other answers, you can use charindex
to determine the index of any matches and take whichever comes first. Of course, this only works if you're always doing like '%something%'
comparisons; if your pattern is more complex (e.g. like '%this%and%that%'
) charindex
won't be much help.
This approach puts your rules into a table called MatchLogic
, allowing you to easily build up your list of rules without amending the code itself.
declare @testdata table (test nvarchar(64))
insert @testdata (test)
values ('the animal rode in the car')
, ('in the car the animal rode')
, ('the dog rode in the the van')
declare @matchLogic table (partialString nvarchar(32), category nvarchar(32))
insert @matchLogic
values ('animal', 'Animal')
, ('car', 'Car')
select test, (
select top 1 category
from @matchLogic
where test like '%' + partialString + '%'
order by charindex(partialString, test) asc
) category
from @testData t
Another approach would be to match against 1 character from your string, then 2, then 3, etc until you get your first match.
Note:
- This is very inefficient, so not suitable for large datasets.
- This gets the first full match; not the match whose first character comes first; e.g. see the
monkey ... animal ... bicycle
example whereanimal
is the first complete match even thoughmonkey
comes first, asmonkey ... bicycle
only matches once we have the full string.
declare @testdata table (id bigint identity(1,1),test nvarchar(64))
insert @testdata (test)
values ('the animal rode in the car')
, ('in the car the animal rode')
, ('the monkey rode the bicycle')
, ('the monkey (an animal) rode the bicycle')
, ('the dog rode in the the van')
declare @matchLogic table (pattern nvarchar(32), category nvarchar(32))
insert @matchLogic
values ('%animal%', 'Animal')
, ('%car%', 'Car')
, ('%monkey%bicycle%', 'MonkeyBike')
; with substrings as (
select id, test part, len(test) chars
from @testdata
union all
select id, substring(part, 1, chars-1), chars-1
from substrings s
where chars > 1
)
select t.id, t.test, x.category
from @testData t
outer apply (
select top 1 category
from @matchLogic m
inner join substrings s
on part like m.pattern
where s.id = t.id
order by s.chars asc
) x
答案3
得分: -1
你关于它如何检查的理解是错误的。首先将检查第一个 when 子句,如果匹配,第二个将不会被评估。你的两个示例都匹配第一个 when 子句,所以触发的是这个。
这应该给你想要的结果:
case
when Subject like '%animal%' and (Subject not like '%car%' OR CHARINDEX('animal',Subject,1) <= CHARINDEX('car',Subject,1)) then 'Animal'
when Subject like '%car%' and (Subject not like '%animal%' OR CHARINDEX('car',Subject,1) <= CHARINDEX('animal',Subject,1)) then 'Car'
else 'UNKNOWN'
end AS Category,
英文:
You are mistaken on how it checks. The first when clause will get checked first, if it matches, the second one will not get evaluated. Both of your examples match the first when, so that is what is getting triggered.
This should give you what you want:
case
when Subject like '%animal%' and (Subject not like '%car%' OR CHARINDEX('animal',Subject,1) <= CHARINDEX('car',Subject,1)) then 'Animal'
when Subject like '%car%' and (Subject not like '%animal%' OR CHARINDEX('car',Subject,1) <= CHARINDEX('animal',Subject,1)) then 'Car'
else 'UNKNOWN'
end AS Category,
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论