INITCAP在SQL Server查询中的用法

huangapple go评论60阅读模式
英文:

INITCAP in a query in SQL Server

问题

使用SQL Server 2016,我需要以特定方式清除空格并实现INITCAP。

空格清除程序很简单。我在正确实现INITCAP替代时遇到了问题。

https://stackoverflow.com/questions/26012822/initcap-equivalent-in-mssql 上的接受答案是错误的,正如第一条评论中所指出的那样。

我的数据包含具有连续多个空格和特殊字符(&、%等)的值。

stuff(): 在SQL Server 2016中,string_split没有提供用于验证序数值的选项,也不能保证结果以特定顺序返回。因此,我需要编写代码来确保从split_string返回的值按正确顺序排列。

convert(xml,...): 解码大多数XML编码的值。

convert(varchar(max),...): ...因为XML无法在需要SELECT DISTINCT时使用

SQL Fiddle

with T as (
  select *
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining & Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)
),
scrubbed as (
  select T.s as InitialValue
  , CASE 
      WHEN LEN(RTRIM(T.s)) > 0
        THEN LTRIM(RTRIM(T.s))
    END as s
  from T
)
select distinct s.InitialValue
, stuff(
    (
      SELECT ' ' + t2.word
      from (
          select str.value
          , upper(substring(str.value, 1, 1)) + 
            case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
          , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
          from string_split(s.s, ' ') str
        ) t2
      order by t2.idx
      FOR XML PATH('')
    ), 
    1, 
    1, 
    ''
  ) as INITCAP_xml
, convert(
    varchar(max), 
    convert(
      xml, 
      stuff(
        (
          SELECT ' ' + t2.word
          from (
              select str.value
              , upper(substring(str.value, 1, 1)) + 
                case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
              , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
              from string_split(s.s, ' ') str
            ) t2
          order by t2.idx
          FOR XML PATH('')
        ), 
        1, 
        1, 
        ''
      )
    )
  ) as INITCAP_decoded
from scrubbed s

您可以看到,在使用FOR XML时,一些字符会被编码(如[space] = $#x20;和& = &)。通过转换为XML数据类型,一些字符将被解码。但一些字符(如&)仍然保持编码。

InitialValue INITCAP_attempt1 INITCAP_xml INITCAP_decoded
Minesota Mining and Manufacturing Minesota Mining And   Manufacturing Minesota Mining And Manufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing
 tillamook Tillamook Tillamook Tillamook
MUTUAL OF OMAHA Mutual Of Omaha Mutual Of Omaha Mutual Of Omaha
    null null null

REPLACE(s, '&', '&') 似乎不是一个合理的选项,因为我不知道随着时间的推移会遇到什么其他值。是否有一种处理由FOR XML编码的字符的通用方法?

在视图中(因此不使用用户定义函数或存储过程),是否有更好的方法来实现SQL Server中的INITCAP

英文:

Using SQL Server 2016, I have a need to scrub white space a certain way and implement INITCAP.

The whitespace scrubber is simple. I'm having trouble getting the INITCAP replacement working properly.

The accepted answer to https://stackoverflow.com/questions/26012822/initcap-equivalent-in-mssql is wrong, as noted in the first comment.

My data contains values that have multiple spaces in a row and special characters, (&, %, etc.).

stuff(): In SQL Server 2016, string_split does not have an option to prove an ordinal value and does not guarantee that the results are returned in any specific order. So, I need to write code to ensure values are returned from split_string in the correct order.

convert(xml,...): Decodes most of the XML-encoded values.

convert(varchar(max),...): ...because XML can't be used when needing SELECT DISTINCT

SQL Fiddle

with T as (
select *
from (
values ('Minesota Mining and   Manufacturing')
, ('Minesota Mining & Manufacturing   ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, ('   ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE 
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) + 
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
), 
1, 
1, 
''
) as INITCAP_xml
, convert(
varchar(max), 
convert(
xml, 
stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) + 
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
), 
1, 
1, 
''
)
)
) as INITCAP_decoded
from scrubbed s

You see in the output that using FOR XML causes some of the characters to be encoded (like [space] = $#x20; and & = &). By converting to XML data type, some of those characters are decoded. But some characters (like &) remain encoded.

InitialValue INITCAP_attempt1 INITCAP_xml INITCAP_decoded
Minesota Mining and Manufacturing Minesota Mining And   Manufacturing Minesota Mining And Manufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing
 tillamook Tillamook Tillamook Tillamook
MUTUAL OF OMAHA Mutual Of Omaha Mutual Of Omaha Mutual Of Omaha
    null null null

REPLACE(s, '&', '&') doesn't seem like a reasonable option because I don't know what other values I'll run into over time. Is there a good, general way to handle characters that will be encoded by FOR XML?

Within a view (so, without using user defined functions or stored procedures), is there a better way to implement INITCAP in SQL Server?

答案1

得分: 2

以下是代码部分的翻译:

select *
       ,[dbo].[svf-Str-Proper] (S)
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining & Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)
Results

s                                   	(No column name)
Minesota Mining and   Manufacturing 	Minesota Mining And Manufacturing
Minesota Mining & Manufacturing     	Minesota Mining & Manufacturing
 tillamook	                            Tillamook
MUTUAL OF OMAHA	                        Mutual Of Omaha
**The Function if Iterested**

CREATE FUNCTION [dbo].[svf-Str-Proper] (@S varchar(max))
Returns varchar(max)
As
Begin
    Set @S = ' '+ltrim(rtrim(replace(replace(replace(lower(@S),' ','†‡'),'‡†',''),'†‡',' ')))+' ';
    ;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''''') ) A(P))
         ,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
                                        ,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
                                        ,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
                                     ) A(S))
         ,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B 
                   Union All 
                   Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&') 
                  ) 
    Select @S = replace(@S,F,T) From cte3 
    Return rtrim(ltrim(@S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
--          Select [dbo].[svf-Str-Proper]('james e. o''''neil')
--          Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')

如果还有其他需要翻译的内容,请提供具体文本。

英文:

If interested in a SVF, here is a scaled down version which allows customization and edge events. For example rather than Phd, you would get PhD ... MacDonald, O'Neil

This is a dramatically scaled down version.. My rules/exceptions are in a generic mapping table.

Example

 select *
,[dbo].[svf-Str-Proper] (S)
from (
values ('Minesota Mining and   Manufacturing')
, ('Minesota Mining & Manufacturing   ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, ('   ')
) q(s)

Results

s                                   	(No column name)
Minesota Mining and   Manufacturing 	Minesota Mining And Manufacturing
Minesota Mining & Manufacturing     	Minesota Mining & Manufacturing
tillamook	                            Tillamook
MUTUAL OF OMAHA	                        Mutual Of Omaha

The Function if Iterested

CREATE FUNCTION [dbo].[svf-Str-Proper] (@S varchar(max))
Returns varchar(max)
As
Begin
Set @S = ' '+ltrim(rtrim(replace(replace(replace(lower(@S),' ','†‡'),'‡†',''),'†‡',' ')))+' '
;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''') ) A(P))
,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
) A(S))
,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B 
Union All 
Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&') 
) 
Select @S = replace(@S,F,T) From cte3 
Return rtrim(ltrim(@S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
--          Select [dbo].[svf-Str-Proper]('james e. o''neil')
--          Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')

答案2

得分: 1

请尝试以下解决方案。

它使用了SQL Server XML、XQuery和其FLWOR表达式。

值得注意的是:

  • cast as xs:token? 处理了空白字符,即:
    • 所有不可见的制表符、回车和换行符将被替换为空格。
    • 然后从值中删除前导和尾随空格。
    • 此外,多于一个空格的连续出现将被替换为一个空格。
  • FLWOR表达式确保了正确的大小写。

SQL

-- DDL和示例数据填充,开始
DECLARE @tbl TABLE (tokens VARCHAR(MAX));
INSERT @tbl (tokens) VALUES
('mineSota Mining and   MaNufacturing'),
('Minesota Mining & Manufacturing   '),
('tillamook'),
('MUTUAL  OF   OMAHA'),
('   ');
-- DDL和示例数据填充,结束
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
, result = c.query('
for $x in /root/r/text()
return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
').value('text()[1]', 'VARCHAR(MAX)')
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + tokens + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
REPLACE(scrubbed, @separator, ']]></r><r><![CDATA[') + 
']]></r></root>' AS XML)) AS t2(c);

输出

tokens scrubbed result
mineSota Mining and MaNufacturing mineSota Mining and MaNufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing
tillamook tillamook Tillamook
MUTUAL OF OMAHA MUTUAL OF OMAHA Mutual Of Omaha
NULL
英文:

Please try the following solution.

It is using SQL Server XML, XQuery, and its FLWOR expression.

Notable points:

  • cast as xs:token? is taking care of the whitespaces, i.e:
    • All invisible TAB, Carriage Return, and Line Feed characters will be
      replaced with spaces.
    • Then leading and trailing spaces are removed from the value.
    • Further, contiguous occurrences of more than one space will be replaced with a single space.
  • FLWOR expression is taking care of a proper case.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (tokens VARCHAR(MAX));
INSERT @tbl (tokens) VALUES
(&#39;mineSota Mining and   MaNufacturing&#39;),
(&#39;Minesota Mining &amp; Manufacturing   &#39;),
(&#39; tillamook&#39;),
(&#39;MUTUAL  OF   OMAHA&#39;),
(&#39;   &#39;);
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
, result = c.query(&#39;
for $x in /root/r/text()
return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
&#39;).value(&#39;text()[1]&#39;, &#39;VARCHAR(MAX)&#39;)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST(&#39;&lt;r&gt;&lt;![CDATA[&#39; + tokens + &#39; &#39; + &#39;]]&gt;&lt;/r&gt;&#39; AS XML).value(&#39;(/r/text())[1] cast as xs:token?&#39;,&#39;VARCHAR(MAX)&#39;)) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST(&#39;&lt;root&gt;&lt;r&gt;&lt;![CDATA[&#39; + 
REPLACE(scrubbed, @separator, &#39;]]&gt;&lt;/r&gt;&lt;r&gt;&lt;![CDATA[&#39;) + 
&#39;]]&gt;&lt;/r&gt;&lt;/root&gt;&#39; AS XML)) AS t2(c);

Output

tokens scrubbed result
mineSota Mining and MaNufacturing mineSota Mining and MaNufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing Minesota Mining & Manufacturing
tillamook tillamook Tillamook
MUTUAL OF OMAHA MUTUAL OF OMAHA Mutual Of Omaha
NULL

答案3

得分: 0

以下是已经翻译好的部分:

You've made the classic SQL Server XML mistake, one cannot just use the PATH('').
你犯了经典的SQL Server XML错误,不能只是使用PATH('')。

You have to do the convoluted PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') thing to get proper encoded characters.
你必须执行复杂的PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')操作才能获得正确编码的字符。

Here's your fixed version:
以下是已修复的版本:

with T as (
  select *
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining &amp; Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)
),
scrubbed as (
  select T.s as InitialValue
  , CASE 
      WHEN LEN(RTRIM(T.s)) > 0
        THEN LTRIM(RTRIM(T.s))
    END as s
  from T
)
select distinct s.InitialValue
, stuff(
    (
      SELECT ' ' + t2.word
      from (
          select str.value
          , upper(substring(str.value, 1, 1)) + 
            case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
          , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
          from string_split(s.s, ' ') str
        ) t2
      order by t2.idx
      FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)'), 
    1, 
    1, 
    ''
  ) as INITCAP
from scrubbed s

请注意,我已经保留了SQL代码的原文,只翻译了需要翻译的部分。

英文:

You've made the classic SQL Server XML mistake, one cannot just use the PATH('').
You have to do the convuluted PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') thing to get proper encoded characters.

Here's your fixed version:

with T as (
  select *
  from (
  values (&#39;Minesota Mining and   Manufacturing&#39;)
  , (&#39;Minesota Mining &amp; Manufacturing   &#39;)
  , (&#39; tillamook&#39;)
  , (&#39;MUTUAL OF OMAHA&#39;)
  , (&#39;   &#39;)
  ) q(s)
),
scrubbed as (
  select T.s as InitialValue
  , CASE 
      WHEN LEN(RTRIM(T.s)) &gt; 0
        THEN LTRIM(RTRIM(T.s))
    END as s
  from T
)
select distinct s.InitialValue
, stuff(
    (
      SELECT &#39; &#39; + t2.word
      from (
          select str.value
          , upper(substring(str.value, 1, 1)) + 
            case when len(str.value) &gt; 1 then lower(substring(str.value, 2, len(str.value) - 1)) else &#39;&#39; end as word
          , charindex(&#39; &#39; + str.value + &#39; &#39;, &#39; &#39; + s.s + &#39; &#39;) as idx
          from string_split(s.s, &#39; &#39;) str
        ) t2
      order by t2.idx
      FOR XML PATH(&#39;&#39;), TYPE
    ).value(&#39;.&#39;, &#39;NVARCHAR(MAX)&#39;), 
    1, 
    1, 
    &#39;&#39;
  ) as INITCAP
from scrubbed s

huangapple
  • 本文由 发表于 2023年2月10日 03:17:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403427.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定