英文:
INITCAP in a query in SQL Server
问题
使用SQL Server 2016,我需要以特定方式清除空格并实现INITCAP。
空格清除程序很简单。我在正确实现INITCAP替代时遇到了问题。
https://stackoverflow.com/questions/26012822/initcap-equivalent-in-mssql 上的接受答案是错误的,正如第一条评论中所指出的那样。
我的数据包含具有连续多个空格和特殊字符(&、%等)的值。
stuff()
: 在SQL Server 2016中,string_split
没有提供用于验证序数值的选项,也不能保证结果以特定顺序返回。因此,我需要编写代码来确保从split_string
返回的值按正确顺序排列。
convert(xml,...)
: 解码大多数XML编码的值。
convert(varchar(max),...)
: ...因为XML无法在需要SELECT DISTINCT
时使用
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
) as INITCAP_xml
, convert(
varchar(max),
convert(
xml,
stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
)
)
) as INITCAP_decoded
from scrubbed s
您可以看到,在使用FOR XML
时,一些字符会被编码(如[space] = $#x20;
和& = &
)。通过转换为XML数据类型,一些字符将被解码。但一些字符(如&
)仍然保持编码。
InitialValue | INITCAP_attempt1 | INITCAP_xml | INITCAP_decoded |
---|---|---|---|
Minesota Mining and Manufacturing |
Minesota Mining And   Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
tillamook |
Tillamook |
Tillamook |
Tillamook |
MUTUAL OF OMAHA |
Mutual Of Omaha |
Mutual Of Omaha |
Mutual Of Omaha |
|
null | null | null |
REPLACE(s, '&', '&')
似乎不是一个合理的选项,因为我不知道随着时间的推移会遇到什么其他值。是否有一种处理由FOR XML
编码的字符的通用方法?
在视图中(因此不使用用户定义函数或存储过程),是否有更好的方法来实现SQL Server中的INITCAP
?
英文:
Using SQL Server 2016, I have a need to scrub white space a certain way and implement INITCAP.
The whitespace scrubber is simple. I'm having trouble getting the INITCAP replacement working properly.
The accepted answer to https://stackoverflow.com/questions/26012822/initcap-equivalent-in-mssql is wrong, as noted in the first comment.
My data contains values that have multiple spaces in a row and special characters, (&, %, etc.).
stuff()
: In SQL Server 2016, string_split
does not have an option to prove an ordinal value and does not guarantee that the results are returned in any specific order. So, I need to write code to ensure values are returned from split_string
in the correct order.
convert(xml,...)
: Decodes most of the XML-encoded values.
convert(varchar(max),...)
: ...because XML can't be used when needing SELECT DISTINCT
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
) as INITCAP_xml
, convert(
varchar(max),
convert(
xml,
stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
)
)
) as INITCAP_decoded
from scrubbed s
You see in the output that using FOR XML
causes some of the characters to be encoded (like [space] = $#x20;
and & = &
). By converting to XML data type, some of those characters are decoded. But some characters (like &
) remain encoded.
InitialValue | INITCAP_attempt1 | INITCAP_xml | INITCAP_decoded |
---|---|---|---|
Minesota Mining and Manufacturing |
Minesota Mining And   Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining And Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
Minesota Mining & Manufacturing |
tillamook |
Tillamook |
Tillamook |
Tillamook |
MUTUAL OF OMAHA |
Mutual Of Omaha |
Mutual Of Omaha |
Mutual Of Omaha |
|
null | null | null |
REPLACE(s, '&', '&')
doesn't seem like a reasonable option because I don't know what other values I'll run into over time. Is there a good, general way to handle characters that will be encoded by FOR XML
?
Within a view (so, without using user defined functions or stored procedures), is there a better way to implement INITCAP
in SQL Server?
答案1
得分: 2
以下是代码部分的翻译:
select *
,[dbo].[svf-Str-Proper] (S)
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
Results
s (No column name)
Minesota Mining and Manufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing
tillamook Tillamook
MUTUAL OF OMAHA Mutual Of Omaha
**The Function if Iterested**
CREATE FUNCTION [dbo].[svf-Str-Proper] (@S varchar(max))
Returns varchar(max)
As
Begin
Set @S = ' '+ltrim(rtrim(replace(replace(replace(lower(@S),' ','†‡'),'‡†',''),'†‡',' ')))+' ';
;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''''') ) A(P))
,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
) A(S))
,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B
Union All
Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&')
)
Select @S = replace(@S,F,T) From cte3
Return rtrim(ltrim(@S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
-- Select [dbo].[svf-Str-Proper]('james e. o''''neil')
-- Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')
如果还有其他需要翻译的内容,请提供具体文本。
英文:
If interested in a SVF, here is a scaled down version which allows customization and edge events. For example rather than Phd
, you would get PhD
... MacDonald, O'Neil
This is a dramatically scaled down version.. My rules/exceptions are in a generic mapping table.
Example
select *
,[dbo].[svf-Str-Proper] (S)
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
Results
s (No column name)
Minesota Mining and Manufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing
tillamook Tillamook
MUTUAL OF OMAHA Mutual Of Omaha
The Function if Iterested
CREATE FUNCTION [dbo].[svf-Str-Proper] (@S varchar(max))
Returns varchar(max)
As
Begin
Set @S = ' '+ltrim(rtrim(replace(replace(replace(lower(@S),' ','†‡'),'‡†',''),'†‡',' ')))+' '
;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''') ) A(P))
,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
) A(S))
,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B
Union All
Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&')
)
Select @S = replace(@S,F,T) From cte3
Return rtrim(ltrim(@S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
-- Select [dbo].[svf-Str-Proper]('james e. o''neil')
-- Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')
答案2
得分: 1
请尝试以下解决方案。
它使用了SQL Server XML、XQuery和其FLWOR表达式。
值得注意的是:
cast as xs:token?
处理了空白字符,即:- 所有不可见的制表符、回车和换行符将被替换为空格。
- 然后从值中删除前导和尾随空格。
- 此外,多于一个空格的连续出现将被替换为一个空格。
- FLWOR表达式确保了正确的大小写。
SQL
-- DDL和示例数据填充,开始
DECLARE @tbl TABLE (tokens VARCHAR(MAX));
INSERT @tbl (tokens) VALUES
('mineSota Mining and MaNufacturing'),
('Minesota Mining & Manufacturing '),
('tillamook'),
('MUTUAL OF OMAHA'),
(' ');
-- DDL和示例数据填充,结束
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
, result = c.query('
for $x in /root/r/text()
return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
').value('text()[1]', 'VARCHAR(MAX)')
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + tokens + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(scrubbed, @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t2(c);
输出
tokens | scrubbed | result |
---|---|---|
mineSota Mining and MaNufacturing | mineSota Mining and MaNufacturing | Minesota Mining And Manufacturing |
Minesota Mining & Manufacturing | Minesota Mining & Manufacturing | Minesota Mining & Manufacturing |
tillamook | tillamook | Tillamook |
MUTUAL OF OMAHA | MUTUAL OF OMAHA | Mutual Of Omaha |
NULL |
英文:
Please try the following solution.
It is using SQL Server XML, XQuery, and its FLWOR expression.
Notable points:
cast as xs:token?
is taking care of the whitespaces, i.e:- All invisible TAB, Carriage Return, and Line Feed characters will be
replaced with spaces. - Then leading and trailing spaces are removed from the value.
- Further, contiguous occurrences of more than one space will be replaced with a single space.
- All invisible TAB, Carriage Return, and Line Feed characters will be
- FLWOR expression is taking care of a proper case.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (tokens VARCHAR(MAX));
INSERT @tbl (tokens) VALUES
('mineSota Mining and MaNufacturing'),
('Minesota Mining & Manufacturing '),
(' tillamook'),
('MUTUAL OF OMAHA'),
(' ');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
, result = c.query('
for $x in /root/r/text()
return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
').value('text()[1]', 'VARCHAR(MAX)')
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + tokens + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(scrubbed, @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t2(c);
Output
tokens | scrubbed | result |
---|---|---|
mineSota Mining and MaNufacturing | mineSota Mining and MaNufacturing | Minesota Mining And Manufacturing |
Minesota Mining & Manufacturing | Minesota Mining & Manufacturing | Minesota Mining & Manufacturing |
tillamook | tillamook | Tillamook |
MUTUAL OF OMAHA | MUTUAL OF OMAHA | Mutual Of Omaha |
NULL |
答案3
得分: 0
以下是已经翻译好的部分:
You've made the classic SQL Server XML mistake, one cannot just use the PATH('').
你犯了经典的SQL Server XML错误,不能只是使用PATH('')。
You have to do the convoluted PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') thing to get proper encoded characters.
你必须执行复杂的PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')操作才能获得正确编码的字符。
Here's your fixed version:
以下是已修复的版本:
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'),
1,
1,
''
) as INITCAP
from scrubbed s
请注意,我已经保留了SQL代码的原文,只翻译了需要翻译的部分。
英文:
You've made the classic SQL Server XML mistake, one cannot just use the PATH('').
You have to do the convuluted PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') thing to get proper encoded characters.
Here's your fixed version:
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'),
1,
1,
''
) as INITCAP
from scrubbed s
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论