如何在SQL Server中将一个值按多个特殊字符拆分成具有不同值的列?

huangapple go评论62阅读模式
英文:

How do I split a value on multiple special characters into columns with distinct values in SQL Server?

问题

col1 col2 col3 col4 col5
ModelOutput Prime Revenue Spend AtLevel

我尝试了以下的SQL:

SELECT
    CASE
        WHEN CHARINDEX('_', formula) > 0
            THEN SUBSTRING(name, 1, CHARINDEX('_', formula) - 1)
            ELSE name
    END AS col1,
    CASE
        WHEN CHARINDEX('_', formula) > 0
            THEN SUBSTRING(formula, CHARINDEX('_', formula) + 1, LEN(formula))
            ELSE NULL
    END AS col2 
FROM
    tableb

但它只会分割第一次出现的下划线,而不会识别第二次出现的下划线。

英文:

Input:

ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()

Output :

col1 col2 col3 col4 col5
ModelOutput Prime Revenue Spend AtLevel

I tried with this SQL:

SELECT
    CASE
        WHEN CHARINDEX('_', formula) > 0
            THEN SUBSTRING(name, 1, CHARINDEX('_', formula) -1)
            ELSE name
    END col1,
    CASE 
        WHEN CHARINDEX('_', formula) > 0
            THEN SUBSTRING(formula, CHARINDEX('_', formula) + 1, LEN(formula))
            ELSE NULL 
    END AS col2 
FROM
    tableb

But it will only split the first occurrence, not picking up the second occurrence

答案1

得分: 1

以下是你的翻译:

我将从重复我的评论开始:
> 理想情况下?首先不要将这些数据插入到数据库中,而是在它们进入表之前分割它们。

我还要指出,在评论中,你确认字符串没有任何格式,这意味着列的数量是动态的。这进一步表明T-SQL完全不适合这种情况。修复生成这些糟糕字符串的任何问题,并规范化你的设计(不管是什么)。


尽管如此,你可以这样做。我不是因为我认为这是一个可怕的想法而解释这一点;如果你真的想要/“需要”使用它,那么你需要花时间了解这个解决方案,并根据自己的需求使用它(如果你需要使用数据集,你需要调整它,你需要自己完成)。它也不会给出你问题中的结果,因为你忽略了第一个 Prime,但是第二个 AtLevel;这个解决方案始终使用第一个实例,这意味着AtLevel是你的字符串的第三列。

它还使用了我的“辅助”函数,我包括了它的定义。我假设你所说的“特

英文:

I am going to start by repeating my comment:
> Ideally? By not inserting such data into your database in the first place and splitting it before it gets anywhere near your table.

I'm also going to point out that in the comments you confirm that the string has no format, meaning that the number of columns is dynamic. This further says that T-SQL is the completely wrong place. Fix what ever it is that is generating these terrible strings and normalise your design (what ever that is).


That being said, you can do this. I don't explain this because I think it's a terrible idea; if you really want/"need" to use this then it's up to you to take the time to understand the solution and use it for your own needs (if you need to consume a dataset, you'll need to adjust this, and you're on your own there). It also does not give the results in your question because you ignore the first Prime but the second AtLevel; this solution always uses the first instance meaning AtLevel is the third column for your string.

It uses a "helper" function of mine as well, which I include the definition of. I assume by "special character" you mean anything that isn't an alphanumerical value. Finally, this uses SQL Serve 2022+ functionality. If you're on 2019 or prior you'll need to use a different splitter which provides an ordinal position (again, you're on your own there).

CREATE FUNCTION [fn].[PatternCharacterReplace] (@String varchar(8000),
                                                @Pattern varchar(100),
                                                @ReplacementCharacter varchar(1))
RETURNS table
AS RETURN
    --Copyright Thom A (https://larnu.uk) under CC-by-SA
    WITH N AS
        (SELECT N
         FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS
        (SELECT TOP (DATALENGTH(@String))
                ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
         FROM N N1,N N2,N N3,N N4) --10000 rows.
    SELECT STRING_AGG(CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END, '')WITHIN GROUP(ORDER BY T.I) AS ReplacedString
    FROM Tally T
         CROSS APPLY (VALUES (SUBSTRING(@String, T.I, 1))) V (C);
GO

DECLARE @YourString varchar(8000) = 'This_is_a_(terrible)_idea/Why_would_you_do_this/Do_not_do(this)';
DECLARE @SQL nvarchar(MAX),
        @CRLF nchar(2) = NCHAR(13) + NCHAR(10);

DECLARE @Delimiter nvarchar(20) = N',' + @CRLF + N'       ';

WITH RNs AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY SS.value ORDER BY SS.ordinal) AS RN
    FROM fn.PatternCharacterReplace(@YourString, '[^A-z0-9]',',') PCR
         CROSS APPLY STRING_SPLIT(PCR.ReplacedString,',',1) SS
    WHERE SS.value <> ''),
ColNos AS(
    SELECT *,
           ROW_NUMBER() OVER (ORDER BY ordinal) AS ColNo
    FROM RNs
    WHERE RN = 1)
SELECT @SQL = 
       N'WITH RNs AS(' + @CRLF +
       N'    SELECT *,' + @CRLF + 
       N'           ROW_NUMBER() OVER (PARTITION BY SS.value ORDER BY SS.ordinal) AS RN' + @CRLF + 
       N'    FROM fn.PatternCharacterReplace(@YourString, ''[^A-z0-9]'','','') PCR' + @CRLF + 
       N'         CROSS APPLY STRING_SPLIT(PCR.ReplacedString,'','',1) SS' + @CRLF + 
       N'    WHERE SS.value <> ''''),' + @CRLF + 
       N'ColNos AS(' + @CRLF + 
       N'    SELECT *,' + @CRLF + 
       N'           ROW_NUMBER() OVER (ORDER BY ordinal) AS ColNo' + @CRLF + 
       N'    FROM RNs' + @CRLF + 
       N'    WHERE RN = 1)' + @CRLF + 
       N'SELECT ' +
       STRING_AGG(CONCAT(N'MAX(CASE ColNo WHEN ', ColNo, N' THEN value END) AS ', QUOTENAME(CONCAT(N'Col',ColNo))), @Delimiter) WITHIN GROUP (ORDER BY ColNo) + @CRLF +
       N'FROM ColNos;'
FROM ColNos;

PRINT @SQL;

EXEC sys.sp_executesql @SQL, N'@YourString varchar(8000)', @YourString;

答案2

得分: 0

SQL Server XML和XQuery允许相对容易地实现它。

建议的解决方案将从SQL Server 2017开始起作用。

通过添加XPath谓词很容易排除特定单词。

值得注意的要点:

  • TRANSLATE()函数调用将输入字符串作为仅由下划线字符分隔的标记。
  • XQuery distinct-values()调用将删除重复的值。

SQL

USE tempdb;
GO

SET NOCOUNT ON;

-- DDL和示例数据填充,开始
DROP TABLE IF EXISTS dbo.tbl;

CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO dbo.tbl (tokens) VALUES
(N'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()'),
(N'one_two_three_four');
-- DDL和示例数据填充,结束

DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10)
   , @SQL AS NVARCHAR(MAX)
   , @cnt INT, @i INT;

-- 计算列的总数
SET @cnt = (SELECT MAX(cnt)
FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r>' + 
      REPLACE(TRANSLATE(tokens,'()/','___'), @separator, '</r><r>') + 
      '</r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.value('count(distinct-values(/root/r[text()]))', 'INT')) AS t2(cnt));

-- 构建T-SQL语句的SELECT子句
SET @SQL = 'SELECT t.*' + @CrLf;

-- 循环遍历列
SET @i = 1;
WHILE @i <= @cnt BEGIN
   SET @SQL += FORMATMESSAGE(', x.value(''(/root/r[%i]/text())[1]'', ''NVARCHAR(20)'') AS col%i%s', @i, @i, @CrLf)
   SET @i += 1;
END

-- 构建T-SQL语句的FROM子句
SET @SQL += 'FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST(''<root><r>'' + 
      REPLACE(TRANSLATE(tokens,'''/''','''___'''), ''_'', ''</r><r>'') + 
      ''</r></root>'' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.query(''<root>
	{ 
		for $x in distinct-values(/root/r[text()])
		return <r>{$x}</r> 
	}
</root>'')) AS t2(x);';

PRINT @SQL;
EXEC sys.sp_executesql @SQL;

输出

ID tokens col1 col2 col3 col4 col5
1 ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel() ModelOutput Prime AtLevel Revenue Spend
2 one_two_three_four one two three four NULL
英文:

SQL Server XML and XQuery allow to implement it relatively easy.

The proposed solution will work starting from SQL Server 2017 onwards.

It is very easy to exclude certain words by adding an XPath predicate.

Notable points:

  • TRANSLATE() function call is making input string as tokens
    separated by the underscore character only.
  • XQuery distinct-values() call is removing duplicate values.

SQL

USE tempdb;
GO
SET NOCOUNT ON;
-- DDL and sample data population, start
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO dbo.tbl (tokens) VALUES
(N&#39;ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()&#39;),
(N&#39;one_two_three_four&#39;);
-- DDL and sample data population, end
DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10)
, @SQL AS NVARCHAR(MAX)
, @cnt INT, @i INT;
-- count total number of columns
SET @cnt = (SELECT MAX(cnt)
FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST(&#39;&lt;root&gt;&lt;r&gt;&#39; + 
REPLACE(TRANSLATE(tokens,&#39;()/&#39;,&#39;___&#39;), @separator, &#39;&lt;/r&gt;&lt;r&gt;&#39;) + 
&#39;&lt;/r&gt;&lt;/root&gt;&#39; AS XML)) AS t1(c)
CROSS APPLY (SELECT c.value(&#39;count(distinct-values(/root/r[text()]))&#39;, &#39;INT&#39;)) AS t2(cnt));
-- compose T-SQL statement SELECT clause
SET @SQL = &#39;SELECT t.*&#39; + @CrLf;
-- loop through 
SET @i = 1;
WHILE @i &lt;= @cnt BEGIN
SET @SQL += FORMATMESSAGE(&#39;, x.value(&#39;&#39;(/root/r[%i]/text())[1]&#39;&#39;, &#39;&#39;NVARCHAR(20)&#39;&#39;) AS col%i%s&#39;, @i, @i, @CrLf)
SET @i += 1;
END
-- compose T-SQL statement FROM clause
SET @SQL += &#39;FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST(&#39;&#39;&lt;root&gt;&lt;r&gt;&#39;&#39; + 
REPLACE(TRANSLATE(tokens,&#39;&#39;()/&#39;&#39;,&#39;&#39;___&#39;&#39;), &#39;&#39;_&#39;&#39;, &#39;&#39;&lt;/r&gt;&lt;r&gt;&#39;&#39;) + 
&#39;&#39;&lt;/r&gt;&lt;/root&gt;&#39;&#39; AS XML)) AS t1(c)
CROSS APPLY (SELECT c.query(&#39;&#39;&lt;root&gt;
{ 
for $x in distinct-values(/root/r[text()])
return &lt;r&gt;{$x}&lt;/r&gt; 
}
&lt;/root&gt;&#39;&#39;)) AS t2(x);&#39;;
PRINT @SQL;
EXEC sys.sp_executesql @SQL;

Output

ID tokens col1 col2 col3 col4 col5
1 ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel() ModelOutput Prime AtLevel Revenue Spend
2 one_two_three_four one two three four NULL

答案3

得分: 0

只是另一种使用JSON和translate()的选项

并不完全符合期望的结果,但至少是一致的

示例

Declare @YourTable table (ID int,Input varchar(500))
Insert Into @YourTable values 
 (1,'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()')

Select A.ID
      ,Pos1 = JSON_VALUE(JS,'$[0]')
      ,Pos2 = JSON_VALUE(JS,'$[1]')
      ,Pos3 = JSON_VALUE(JS,'$[2]')
      ,Pos4 = JSON_VALUE(JS,'$[3]')
      ,Pos5 = JSON_VALUE(JS,'$[4]')
      ,Pos6 = JSON_VALUE(JS,'$[5]')
      ,Pos7 = JSON_VALUE(JS,'$[6]')
 From  @YourTable A
 Cross Apply (values (replace(translate(Input,'()/','___'),'__','_') ) ) B(CleanString)
 Cross Apply (values ('["'+replace(string_escape(CleanString,'json'),'_','","')+'"]') ) C(JS)

结果

如何在SQL Server中将一个值按多个特殊字符拆分成具有不同值的列?

英文:

Just another option using JSON and translate()

Not exactly the desired results, but at least consistent

Example

Declare @YourTable table (ID int,Input varchar(500))
Insert Into @YourTable values 
(1,&#39;ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()&#39;)
Select A.ID
,Pos1 = JSON_VALUE(JS,&#39;$[0]&#39;)
,Pos2 = JSON_VALUE(JS,&#39;$[1]&#39;)
,Pos3 = JSON_VALUE(JS,&#39;$[2]&#39;)
,Pos4 = JSON_VALUE(JS,&#39;$[3]&#39;)
,Pos5 = JSON_VALUE(JS,&#39;$[4]&#39;)
,Pos6 = JSON_VALUE(JS,&#39;$[5]&#39;)
,Pos7 = JSON_VALUE(JS,&#39;$[6]&#39;)
From  @YourTable A
Cross Apply (values (replace(translate(Input,&#39;()/&#39;,&#39;___&#39;),&#39;__&#39;,&#39;_&#39;) ) ) B(CleanString)
Cross Apply (values (&#39;[&quot;&#39;+replace(string_escape(CleanString,&#39;json&#39;),&#39;_&#39;,&#39;&quot;,&quot;&#39;)+&#39;&quot;]&#39;) ) C(JS)

Results

如何在SQL Server中将一个值按多个特殊字符拆分成具有不同值的列?

huangapple
  • 本文由 发表于 2023年5月29日 20:37:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357450.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定