英文:
How do I split a value on multiple special characters into columns with distinct values in SQL Server?
问题
col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|
ModelOutput | Prime | Revenue | Spend | AtLevel |
我尝试了以下的SQL:
SELECT
CASE
WHEN CHARINDEX('_', formula) > 0
THEN SUBSTRING(name, 1, CHARINDEX('_', formula) - 1)
ELSE name
END AS col1,
CASE
WHEN CHARINDEX('_', formula) > 0
THEN SUBSTRING(formula, CHARINDEX('_', formula) + 1, LEN(formula))
ELSE NULL
END AS col2
FROM
tableb
但它只会分割第一次出现的下划线,而不会识别第二次出现的下划线。
英文:
Input:
ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()
Output :
col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|
ModelOutput | Prime | Revenue | Spend | AtLevel |
I tried with this SQL:
SELECT
CASE
WHEN CHARINDEX('_', formula) > 0
THEN SUBSTRING(name, 1, CHARINDEX('_', formula) -1)
ELSE name
END col1,
CASE 
WHEN CHARINDEX('_', formula) > 0
THEN SUBSTRING(formula, CHARINDEX('_', formula) + 1, LEN(formula))
ELSE NULL 
END AS col2
FROM
tableb
But it will only split the first occurrence, not picking up the second occurrence
答案1
得分: 1
以下是你的翻译:
我将从重复我的评论开始:
> 理想情况下?首先不要将这些数据插入到数据库中,而是在它们进入表之前分割它们。
我还要指出,在评论中,你确认字符串没有任何格式,这意味着列的数量是动态的。这进一步表明T-SQL完全不适合这种情况。修复生成这些糟糕字符串的任何问题,并规范化你的设计(不管是什么)。
尽管如此,你可以这样做。我不是因为我认为这是一个可怕的想法而解释这一点;如果你真的想要/“需要”使用它,那么你需要花时间了解这个解决方案,并根据自己的需求使用它(如果你需要使用数据集,你需要调整它,你需要自己完成)。它也不会给出你问题中的结果,因为你忽略了第一个 Prime
,但是第二个 AtLevel
;这个解决方案始终使用第一个实例,这意味着AtLevel
是你的字符串的第三列。
它还使用了我的“辅助”函数,我包括了它的定义。我假设你所说的“特
英文:
I am going to start by repeating my comment:
> Ideally? By not inserting such data into your database in the first place and splitting it before it gets anywhere near your table.
I'm also going to point out that in the comments you confirm that the string has no format, meaning that the number of columns is dynamic. This further says that T-SQL is the completely wrong place. Fix what ever it is that is generating these terrible strings and normalise your design (what ever that is).
That being said, you can do this. I don't explain this because I think it's a terrible idea; if you really want/"need" to use this then it's up to you to take the time to understand the solution and use it for your own needs (if you need to consume a dataset, you'll need to adjust this, and you're on your own there). It also does not give the results in your question because you ignore the first Prime
but the second AtLevel
; this solution always uses the first instance meaning AtLevel
is the third column for your string.
It uses a "helper" function of mine as well, which I include the definition of. I assume by "special character" you mean anything that isn't an alphanumerical value. Finally, this uses SQL Serve 2022+ functionality. If you're on 2019 or prior you'll need to use a different splitter which provides an ordinal position (again, you're on your own there).
CREATE FUNCTION [fn].[PatternCharacterReplace] (@String varchar(8000),
@Pattern varchar(100),
@ReplacementCharacter varchar(1))
RETURNS table
AS RETURN
--Copyright Thom A (https://larnu.uk) under CC-by-SA
WITH N AS
(SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS
(SELECT TOP (DATALENGTH(@String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1,N N2,N N3,N N4) --10000 rows.
SELECT STRING_AGG(CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END, '')WITHIN GROUP(ORDER BY T.I) AS ReplacedString
FROM Tally T
CROSS APPLY (VALUES (SUBSTRING(@String, T.I, 1))) V (C);
GO
DECLARE @YourString varchar(8000) = 'This_is_a_(terrible)_idea/Why_would_you_do_this/Do_not_do(this)';
DECLARE @SQL nvarchar(MAX),
@CRLF nchar(2) = NCHAR(13) + NCHAR(10);
DECLARE @Delimiter nvarchar(20) = N',' + @CRLF + N' ';
WITH RNs AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY SS.value ORDER BY SS.ordinal) AS RN
FROM fn.PatternCharacterReplace(@YourString, '[^A-z0-9]',',') PCR
CROSS APPLY STRING_SPLIT(PCR.ReplacedString,',',1) SS
WHERE SS.value <> ''),
ColNos AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ordinal) AS ColNo
FROM RNs
WHERE RN = 1)
SELECT @SQL =
N'WITH RNs AS(' + @CRLF +
N' SELECT *,' + @CRLF +
N' ROW_NUMBER() OVER (PARTITION BY SS.value ORDER BY SS.ordinal) AS RN' + @CRLF +
N' FROM fn.PatternCharacterReplace(@YourString, ''[^A-z0-9]'','','') PCR' + @CRLF +
N' CROSS APPLY STRING_SPLIT(PCR.ReplacedString,'','',1) SS' + @CRLF +
N' WHERE SS.value <> ''''),' + @CRLF +
N'ColNos AS(' + @CRLF +
N' SELECT *,' + @CRLF +
N' ROW_NUMBER() OVER (ORDER BY ordinal) AS ColNo' + @CRLF +
N' FROM RNs' + @CRLF +
N' WHERE RN = 1)' + @CRLF +
N'SELECT ' +
STRING_AGG(CONCAT(N'MAX(CASE ColNo WHEN ', ColNo, N' THEN value END) AS ', QUOTENAME(CONCAT(N'Col',ColNo))), @Delimiter) WITHIN GROUP (ORDER BY ColNo) + @CRLF +
N'FROM ColNos;'
FROM ColNos;
PRINT @SQL;
EXEC sys.sp_executesql @SQL, N'@YourString varchar(8000)', @YourString;
答案2
得分: 0
SQL Server XML和XQuery允许相对容易地实现它。
建议的解决方案将从SQL Server 2017开始起作用。
通过添加XPath谓词很容易排除特定单词。
值得注意的要点:
TRANSLATE()
函数调用将输入字符串作为仅由下划线字符分隔的标记。- XQuery
distinct-values()
调用将删除重复的值。
SQL
USE tempdb;
GO
SET NOCOUNT ON;
-- DDL和示例数据填充,开始
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO dbo.tbl (tokens) VALUES
(N'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()'),
(N'one_two_three_four');
-- DDL和示例数据填充,结束
DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10)
, @SQL AS NVARCHAR(MAX)
, @cnt INT, @i INT;
-- 计算列的总数
SET @cnt = (SELECT MAX(cnt)
FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r>' +
REPLACE(TRANSLATE(tokens,'()/','___'), @separator, '</r><r>') +
'</r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.value('count(distinct-values(/root/r[text()]))', 'INT')) AS t2(cnt));
-- 构建T-SQL语句的SELECT子句
SET @SQL = 'SELECT t.*' + @CrLf;
-- 循环遍历列
SET @i = 1;
WHILE @i <= @cnt BEGIN
SET @SQL += FORMATMESSAGE(', x.value(''(/root/r[%i]/text())[1]'', ''NVARCHAR(20)'') AS col%i%s', @i, @i, @CrLf)
SET @i += 1;
END
-- 构建T-SQL语句的FROM子句
SET @SQL += 'FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST(''<root><r>'' +
REPLACE(TRANSLATE(tokens,'''/''','''___'''), ''_'', ''</r><r>'') +
''</r></root>'' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.query(''<root>
{
for $x in distinct-values(/root/r[text()])
return <r>{$x}</r>
}
</root>'')) AS t2(x);';
PRINT @SQL;
EXEC sys.sp_executesql @SQL;
输出
ID | tokens | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|---|
1 | ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel() | ModelOutput | Prime | AtLevel | Revenue | Spend |
2 | one_two_three_four | one | two | three | four | NULL |
英文:
SQL Server XML and XQuery allow to implement it relatively easy.
The proposed solution will work starting from SQL Server 2017 onwards.
It is very easy to exclude certain words by adding an XPath predicate.
Notable points:
TRANSLATE()
function call is making input string as tokens
separated by the underscore character only.- XQuery
distinct-values()
call is removing duplicate values.
SQL
USE tempdb;
GO
SET NOCOUNT ON;
-- DDL and sample data population, start
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO dbo.tbl (tokens) VALUES
(N'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()'),
(N'one_two_three_four');
-- DDL and sample data population, end
DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10)
, @SQL AS NVARCHAR(MAX)
, @cnt INT, @i INT;
-- count total number of columns
SET @cnt = (SELECT MAX(cnt)
FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r>' +
REPLACE(TRANSLATE(tokens,'()/','___'), @separator, '</r><r>') +
'</r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.value('count(distinct-values(/root/r[text()]))', 'INT')) AS t2(cnt));
-- compose T-SQL statement SELECT clause
SET @SQL = 'SELECT t.*' + @CrLf;
-- loop through
SET @i = 1;
WHILE @i <= @cnt BEGIN
SET @SQL += FORMATMESSAGE(', x.value(''(/root/r[%i]/text())[1]'', ''NVARCHAR(20)'') AS col%i%s', @i, @i, @CrLf)
SET @i += 1;
END
-- compose T-SQL statement FROM clause
SET @SQL += 'FROM dbo.tbl AS t
CROSS APPLY (SELECT TRY_CAST(''<root><r>'' +
REPLACE(TRANSLATE(tokens,''()/'',''___''), ''_'', ''</r><r>'') +
''</r></root>'' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.query(''<root>
{
for $x in distinct-values(/root/r[text()])
return <r>{$x}</r>
}
</root>'')) AS t2(x);';
PRINT @SQL;
EXEC sys.sp_executesql @SQL;
Output
ID | tokens | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|---|
1 | ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel() | ModelOutput | Prime | AtLevel | Revenue | Spend |
2 | one_two_three_four | one | two | three | four | NULL |
答案3
得分: 0
只是另一种使用JSON和translate()
的选项
并不完全符合期望的结果,但至少是一致的
示例
Declare @YourTable table (ID int,Input varchar(500))
Insert Into @YourTable values
(1,'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()')
Select A.ID
,Pos1 = JSON_VALUE(JS,'$[0]')
,Pos2 = JSON_VALUE(JS,'$[1]')
,Pos3 = JSON_VALUE(JS,'$[2]')
,Pos4 = JSON_VALUE(JS,'$[3]')
,Pos5 = JSON_VALUE(JS,'$[4]')
,Pos6 = JSON_VALUE(JS,'$[5]')
,Pos7 = JSON_VALUE(JS,'$[6]')
From @YourTable A
Cross Apply (values (replace(translate(Input,'()/','___'),'__','_') ) ) B(CleanString)
Cross Apply (values ('["'+replace(string_escape(CleanString,'json'),'_','","')+'"]') ) C(JS)
结果
英文:
Just another option using JSON and translate()
Not exactly the desired results, but at least consistent
Example
Declare @YourTable table (ID int,Input varchar(500))
Insert Into @YourTable values
(1,'ModelOutput_Prime_AtLevel(Revenue)/Spend_Prime_AtLevel()')
Select A.ID
,Pos1 = JSON_VALUE(JS,'$[0]')
,Pos2 = JSON_VALUE(JS,'$[1]')
,Pos3 = JSON_VALUE(JS,'$[2]')
,Pos4 = JSON_VALUE(JS,'$[3]')
,Pos5 = JSON_VALUE(JS,'$[4]')
,Pos6 = JSON_VALUE(JS,'$[5]')
,Pos7 = JSON_VALUE(JS,'$[6]')
From @YourTable A
Cross Apply (values (replace(translate(Input,'()/','___'),'__','_') ) ) B(CleanString)
Cross Apply (values ('["'+replace(string_escape(CleanString,'json'),'_','","')+'"]') ) C(JS)
Results
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论