在SQL Server 2016+中,是否可能找到不在一组分隔符内的字符串的第一次出现?

huangapple go评论48阅读模式
英文:

Is it possible to find the first occurrence of a string that's NOT within a set of delimiters in SQL Server 2016+?

问题

我有一个 SQL Server 表中的列,其中包含不同长度的字符串。我需要找到字符串, --的第一个出现位置,该位置不在单引号或方括号中。

例如,在以下两个字符串中,我已经加粗了我想要获取位置的部分。请注意,在第一个字符串中,, --首次出现(不在单引号或方括号定界符之间)的位置是在位置13,在第二个字符串中,它位于位置16。

'a, --'[, --]**, --**[, --]

[a, --b]aaaaaaa_ **, --**', --';

另外,我应该提到, --本身可能会多次出现在字符串中。

以下是显示字符串和我期望的输出的简单查询。

SELECT 
    t.string, t.desired_pos
FROM
    (VALUES (N'a, --'[, --], --[, --]', 14),
            (N'[a, —-b]aaaaaaa_ , --', --'', 18)) t(string, desired_pos)

是否有办法使用 SELECT 查询(或多个查询)来实现这一目标,而不使用函数?

提前感谢您!

我尝试过各种SUBSTRING、CHARINDEX的变化,甚至一些CROSS APPLY,但似乎无法得到我想要的结果。

英文:

I have a column in a SQL Server table that has strings of varying lengths. I need to find the position of the first occurrence of the string , -- that's not enclosed in single quotes or square brackets.

For example, in the following two strings, I've bolded the portion I would like to get the position of. Notice in the first string, the first time , -- appears on its own (without being between single quote or square bracket delimiters) is at position 13 and in the second string, it's at position 16.

'a, --'[, --]**, --**[, --]

[a, --b]aaaaaaa_ **, --**', --'

Also I should mention that , -- itself could appear multiple times in the string.

Here's a simple query that shows the strings and my desired output.

SELECT 
    t.string, t.desired_pos
FROM
    (VALUES (N'''a, --''[, --], --[, --]', 14),
            (N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)

Is there any way to accomplish this using a SELECT query (or multiple) without using a function?

Thank you in advance!

I've tried variations of SUBSTRING, CHARINDEX, and even some CROSS APPLYs but I can't seem to get the result I'm looking for.

答案1

得分: 0

尝试这种方法。我将替换掉你不需要的字符串为另一个相同长度的字符串。然后找到感兴趣字符串的位置。

      选择 字符串, 期望位置, 
             CHARINDEX(', --', REPLACE(REPLACE(字符串, ''''', --''', '******'), '[, --]', '******') 
             ) 起始位置
       (数值 (N''''', --''', --[, --]''', 13),
                   (N'[, --]aaaaaaa_ , --''', --''', 16)) t(字符串, 期望位置)
英文:

Try this approach. I'm replacing the strings you don't need for another string of the same length. Then look for the position of the interested string.

  SELECT string, desired_pos, 
         CHARINDEX(', --', REPLACE(REPLACE(string, ''', --''', '******'), '[, --]', '******') 
         ) start_index
  FROM (VALUES (N''', --''[, --], --[, --]', 13),
               (N'[, --]aaaaaaa_ , --'', --''', 16)) t(string, desired_pos)

答案2

得分: 0

我不知道使用C#解决方案是否合适,但这个用于CSV的类很不错:TextFieldParser

然后你只需定义分隔符等,假设输入被一致地转义,那么一切都会很顺利。

英文:

I don't know if it makes sense with a C# solution, but this class for CVS is a nice little parcer: TextFieldParser

Then you just define Delimeters etc. and assuming the input is escaped consistently then all is good.

答案3

得分: 0

在我写下我的解决方案之前,我必须警告你:不要使用它。使用一个函数,或者在其他语言中完成这个任务。这段代码可能有bug。它不能处理像转义引号等情况。

这个想法是首先移除方括号 [] 和引号 '' 中的内容,然后执行一个“简单”的 charindex 操作。为了移除方括号,我使用了一个递归的CTE,它循环处理匹配的引号部分,并将它们的内容替换为占位字符串。

一个重要的点是引号可能嵌套在彼此之内,所以你必须尝试两种变体,并选择最早的一个。

WITH CTE AS (
	SELECT	*
	FROM
	(VALUES (N'''a, --''[, --], --[, --]''', 14),
            (N'[a, —-b]aaaaaaa_ , --'', --'''', 18)) t(string, desired_pos)
           )
, cte2 AS (
	select	x.start
	,	x.finish
	,	case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
	,	1 as level
	,	string as orig
	,	desired_pos
	from	cte
	CROSS APPLY (
		SELECT	*
		,	ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
		FROM	(
			SELECT	charindex('[', string) AS start
			,	charindex(']', string) AS finish
			UNION ALL
			SELECT	charindex('''', string) AS startQ
			,	charindex('''', string, charindex('''', string) + 1) AS finishQ
		) x
	) x
	WHERE	x.sortorder = 1
	
	UNION ALL
	select	x.start
	,	x.finish
	,	STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
	,	1 as level
	,	orig
	,	desired_pos
	from	cte2
	CROSS APPLY (
		SELECT	*
		,	ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
		FROM	(
			SELECT	charindex('[', newString) AS start
			,	charindex(']', newString) AS finish
			UNION ALL
			SELECT	charindex('''', newString) AS startQ
			,	charindex('''', newString, charindex('''', newString) + 1) AS finishQ
		) x
	) x
	WHERE	x.sortorder = 1
	AND	x.start > 0
	AND	cte2.start > 0 -- 必须是匹配的
)

SELECT	PATINDEX('%, --%', newString), *
from (
	select *, row_number() over(partition by orig order by level desc) AS sort
	from cte2
	) x
where x.sort = 1
英文:

Before i write down my solution, i must warn you: DON'T USE IT. Use a function, or do this in some other language. This code is probably buggy.
It doesn't handle stuff like escaped quotes etcetc.

The idea is to first remove the stuff inside brackets [] and quotes '' and then just do a "simple" charindex.
To remove the brackets, i'm using a recursive CTE that loops ever part of matching quotes and replaces their content with placeholder strings.

One important point is that quotes might be embedded in each other, so you have to try both variants and chose the one that is earliest.

WITH CTE AS (
	SELECT	*
	FROM
	(VALUES (N'''a, --''[, --], --[, --]', 14),
            (N'[a, -b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
           )
, cte2 AS (
	select	x.start
	,	x.finish
	,	case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
	,	1 as level
	,	string as orig
	,	desired_pos
	from	cte
	CROSS APPLY (
		SELECT	*
		,	ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
		FROM	(
			SELECT	charindex('[', string) AS start
			,	charindex(']', string) AS finish
			UNION ALL
			SELECT	charindex('''', string) AS startQ
			,	charindex('''', string, charindex('''', string) + 1) AS finishQ
		) x
	) x
	WHERE	x.sortorder = 1
	
	UNION ALL
	select	x.start
	,	x.finish
	,	STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
	,	1 as level
	,	orig
	,	desired_pos
	from	cte2
	CROSS APPLY (
		SELECT	*
		,	ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
		FROM	(
			SELECT	charindex('[', newString) AS start
			,	charindex(']', newString) AS finish
			UNION ALL
			SELECT	charindex('''', newString) AS startQ
			,	charindex('''', newString, charindex('''', newString) + 1) AS finishQ
		) x
	) x
	WHERE	x.sortorder = 1
	AND	x.start > 0
	AND	cte2.start > 0 -- Must have been a match
)

SELECT	PATINDEX('%, --%', newString), *
from (
	select *, row_number() over(partition by orig order by level desc) AS sort
	from cte2
	) x
where x.sort = 1

答案4

得分: 0

抱歉,我无法提供代码的翻译。如果您有其他需要翻译的文字内容,请随时告诉我。

英文:

Im late the game here but This kind of thing is simple in SQL Server when leveraging NGrams8k. Not only do you not need REGEX, a CLR, C# required. Furthermore, NGrams8k will be the fastest by far. In 8 years nobody has produced anything remotely as fast. Furthermore, this code will be faster and far less complex than a recursive CTE solution (which are almost always slow in SQL Server)

;--==== Sample Data
DECLARE @T Table (String VARCHAR(100))
INSERT @T
VALUES (N'''a, --''[, --], --[, --]'),
(N'[a, —-b]aaaaaaa_ , --'', --''');
;--==== Solution
SELECT
t.String, ng.Position
FROM        @t                                       AS t
CROSS APPLY (VALUES(REPLACE(t.String,'[',CHAR(1))))  AS f(S)
CROSS APPLY samd.NGrams8k(f.S,4)                     AS ng
CROSS APPLY (VALUES(SUBSTRING(f.S,ng.Position-2,7))) AS g(String)
WHERE ng.Token = ', --'
AND  g.String NOT LIKE '%''%''%'
AND  g.String NOT LIKE '%'+CHAR(1)+'%]%';

Results:

String                        Position
----------------------------- --------------------
'a, --'[, --], --[, --]       14
[a, —-b]aaaaaaa_ , --', --'   18

huangapple
  • 本文由 发表于 2023年2月8日 10:03:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75380729.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定