英文:
Recognise repeating number in SQL Server
问题
我有一个 SQL Server 列,其类型为 int
,用于存储测试分数,范围从 10 到 550,但某些测试分数不在此范围内,例如 521521。
我想要提取数字中重复的部分并保存它。在 SQL 中如何实现这一点?
521521
应变成521
919191
应变成91
7676
应变成76
我不想简单地取左边的(最多)三位或字符串的一半,因为我还有数据,例如 123456,这当然应该保持不变,以便作为不可用数据。但是,我愿意冒险处理类似 123123 这样的情况。
因此,我真的在寻找一种识别重复数字/字母的方法。
英文:
I have a SQL Server column of type int
that stores testscores from 10 to 550, but some testscores do not fit the range, e.g. 521521.
What I want is to take the repeating part of the number and save that. How can I do this in SQL ?
521521
should become521
919191
should become91
7676
should become76
I do not want to simply take the left (max) three spots or half of the string, as I also have data that says 123456, which of course should remain clear as unusable data. However, I am willing to take the risk for cases like 123123.
Therefore, I am really looking for something to recognise repeating numbers/letters.
答案1
得分: 3
你不需要在这里进行字符串操作。您可以使用一些模除和整数除法以及大的 CASE
来完成它。
SELECT *,
Corrected = CASE
WHEN col / 100 % 100 = col % 100 AND col / 10000 IN (0, col % 100 )
THEN col % 100
WHEN col / 1000 % 1000 = col % 1000 AND col / 1000000 IN (0, col % 1000 )
THEN col % 1000
WHEN col / 10000 % 10000 = col % 10000 AND col / 100000000 IN (0, col % 10000 )
THEN col % 10000
WHEN col / 100000 % 100000 = col % 100000 AND col / CAST(10000000000 AS bigint) IN (0, col % 100000)
THEN col % 100000
ELSE col
END
FROM @Demo d;
英文:
You don't need string manipulation here. You can do it with some modulo and integer division, and a big CASE
SELECT *,
Corrected = CASE
WHEN col / 100 % 100 = col % 100 AND col / 10000 IN (0, col % 100 )
THEN col % 100
WHEN col / 1000 % 1000 = col % 1000 AND col / 1000000 IN (0, col % 1000 )
THEN col % 1000
WHEN col / 10000 % 10000 = col % 10000 AND col / 100000000 IN (0, col % 10000 )
THEN col % 10000
WHEN col / 100000 % 100000 = col % 100000 AND col / CAST(10000000000 AS bigint) IN (0, col % 100000)
THEN col % 100000
ELSE col
END
FROM @Demo d;
答案2
得分: 2
以下是翻译的内容:
好吧,字符串操作不是 T-SQL 的强项,但我很好奇纯 T-SQL 解决方案有多难找。我的解决方案相当繁琐,但对于你描述的所有情况以及我根据你的问题想出的一些情况都有效。 (顺便说一下,关于 [Thom A 的评论][1](从 111111 预期的结果),它返回 111)
首先,创建并填充示例表格(请在以后的问题中**保存**这一步):
DECLARE @Demo AS TABLE
(
col int,
expected int
);
INSERT INTO @Demo (col, expected) VALUES
(521521, 521), -- 每 3 位数字重复一次
(919191, 91), -- 每 2 位数字重复一次,3 次
(919291, 919291), -- 不重复 - 注意中间的 92
(7676, 76), -- 每 2 位数字重复一次
(123456, 123456), -- 不重复
(2147421474, 21474), -- 每 5 位数字重复一次
(2147483647, 2147483647), -- 不重复
(12341234, 1234), -- 每 4 位数字重复一次
(123123123,123), -- 每 3 位数字重复一次,3 次
(123456123,123456123); -- 不重复
注意注释 - 它们说明了预期结果的原因。
然后,查询部分:我不得不使用了很多 `CROSS APPLY` 和一个 `CASE` 表达式,但它确实提供了预期的结果:
SELECT col, expected,
CASE WHEN HasTwoParts = 1
AND LEFT(StringCol, NumberOfDigits/2) = RIGHT(StringCol, NumberOfDigits/2) THEN LEFT(StringCol, NumberOfDigits/2)
WHEN HasThreeParts = 1
AND LEFT(StringCol, NumberOfDigits/3) = RIGHT(StringCol, NumberOfDigits/3)
AND LEFT(StringCol, NumberOfDigits/3) = SUBSTRING(StringCol, (NumberOfDigits/3)+1, (NumberOfDigits/3)) THEN LEFT(StringCol, NumberOfDigits/3)
WHEN HasFiveParts = 1
AND LEFT(StringCol, NumberOfDigits/5) = RIGHT(StringCol, NumberOfDigits/5) THEN LEFT(StringCol, NumberOfDigits/5)
ELSE StringCol
END As Result
FROM @Demo
CROSS APPLY (SELECT CAST(col as varchar(12)) As StringCol) As AsString
CROSS APPLY (SELECT LEN(StringCol) As NumberOfDigits) As Length
CROSS APPLY (
SELECT IIF(NumberOfDigits % 2 = 0, 1, 0) As HasTwoParts,
IIF(NumberOfDigits % 3 = 0, 1, 0) As HasThreeParts,
IIF(NumberOfDigits % 5 = 0, 1, 0) As HasFiveParts) As LengthDividers;
结果如下:
col expected Result
521521 521 521
919191 91 91
919291 919291 919291
7676 76 76
123456 123456 123456
2147421474 21474 21474
2147483647 2147483647 2147483647
12341234 1234 1234
123123123 123 123
123456123 123456123 123456123
你可以在 [<kbd>db</kbd>fiddle</kbd>][2] 上看到实际演示。
[1]: https://stackoverflow.com/questions/76704556/recognise-repeating-number-in-sql-server/76705167#comment135232454_76704556
[2]: https://dbfiddle.uk/5mx09IVP
英文:
Well, string manipulation isn't T-SQL's strong suit, but I was curious as to how hard will it be to find a pure T-SQL solution for this.
My solution is quite cumbersome but it works for all the scenarios you've described and a few scenarios I came up with based on your question. (BTW, with regards to Thom A's comment (expected result from 111111), it returns 111)
first, create and populate sample table (Please save us this step in your future questions):
DECLARE @Demo AS TABLE
(
col int,
expected int
);
INSERT INTO @Demo (col, expected) VALUES
(521521, 521), -- repeat after 3 digits
(919191, 91), -- repeat after 2 digits, 3 times
(919291, 919291), -- no repeat - Note the 92 in the middle
(7676, 76), -- repeat after 2 digits
(123456, 123456), -- no repeat
(2147421474, 21474), -- repeat after 5 digits
(2147483647, 2147483647), -- no repeat
(12341234, 1234), -- repeat after 4 digits
(123123123,123), -- repeat after 3 digits, 3 times
(123456123,123456123); -- no repeat
Note the comments - they specify why the expected result is what it is.
Then, the query: I've had to use quite a few CROSS APPLY
s and a CASE
expression, but it does provide the expected result:
SELECT col, expected,
CASE WHEN HasTwoParts = 1
AND LEFT(StringCol, NumberOfDigits/2) = RIGHT(StringCol, NumberOfDigits/2) THEN LEFT(StringCol, NumberOfDigits/2)
WHEN HasThreeParts = 1
AND LEFT(StringCol, NumberOfDigits/3) = RIGHT(StringCol, NumberOfDigits/3)
AND LEFT(StringCol, NumberOfDigits/3) = SUBSTRING(StringCol, (NumberOfDigits/3)+1, (NumberOfDigits/3)) THEN LEFT(StringCol, NumberOfDigits/3)
WHEN HasFiveParts = 1
AND LEFT(StringCol, NumberOfDigits/5) = RIGHT(StringCol, NumberOfDigits/5) THEN LEFT(StringCol, NumberOfDigits/5)
ELSE StringCol
END As Result
FROM @Demo
CROSS APPLY (SELECT CAST(col as varchar(12)) As StringCol) As AsString
CROSS APPLY (SELECT LEN(StringCol) As NumberOfDigits) As Length
CROSS APPLY (
SELECT IIF(NumberOfDigits % 2 = 0, 1, 0) As HasTwoParts,
IIF(NumberOfDigits % 3 = 0, 1, 0) As HasThreeParts,
IIF(NumberOfDigits % 5 = 0, 1, 0) As HasFiveParts) As LengthDividers;
And the results:
col expected Result
------------------------------
521521 521 521
919191 91 91
919291 919291 919291
7676 76 76
123456 123456 123456
2147421474 21474 21474
2147483647 2147483647 2147483647
12341234 1234 1234
123123123 123 123
123456123 123456123 123456123
You can see a live demo on <kbd>db<>fiddle</kbd>
答案3
得分: 1
这是另一个版本:
DECLARE @data TABLE (n int, expected int)
INSERT INTO @data
VALUES (521521, 521)
, (919191, 91)
, (7676, 76)
, (123456, 123456)
, (123123, 123)
, (1234567, 1234567)
, (111111111, 111)
, (1111111111, 11111)
, (1000011111, 1000011111)
SELECT n, expected, actual
FROM (
SELECT n, expected, actual
, ROW_NUMBER() OVER(PARTITION BY n
ORDER BY CASE WHEN REPLICATE(actual, LEN(nString) / z.x) = nString THEN 1 ELSE 0 END DESC
, CASE WHEN actual = nString THEN 0 ELSE 1 END DESC
, LEN(actual) DESC
) sort
FROM @data d
CROSS APPLY (
SELECT CAST(n AS VARCHAR(12)) AS nString
) n
CROSS APPLY (
SELECT SUBSTRING(nString, 1, x.x) AS actual
, x.x
FROM (
VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
) x(x)
) z
) x
WHERE x.sort = 1
我所做的是从字符串中取x个字符,然后尝试复制这些字符以填充字符串,并尝试与初始字符串进行比较。如果匹配,这意味着字符串由重复的字符组成。
最后,我对它们进行排序,以获取最理想的结果,排序方式为:
- 首先匹配重复的字符串
- 但不匹配非重复的字符串
- 取最长的重复字符串,这解决了11111111字符串的问题,从技术上讲,它们由“1”、“11”和“1111”重复。
英文:
Here's another version:
DECLARE @data TABLE (n int, expected int)
INSERT INTO @data
VALUES (521521, 521)
, (919191, 91)
, (7676, 76)
, (123456, 123456)
, (123123, 123)
, (1234567, 1234567)
, (111111111, 111)
, (1111111111, 11111)
, (1000011111, 1000011111)
SELECT n, expected, actual
FROM (
SELECT n, expected, actual
, ROW_NUMBER() OVER(PARTITION BY n
ORDER BY CASE WHEN REPLICATE(actual, LEN(nString) / z.x) = nString THEN 1 ELSE 0 END DESC
, CASE WHEN actual = nString THEN 0 ELSE 1 END DESC
, LEN(actual) DESC
) sort
FROM @data d
CROSS APPLY (
SELECT CAST(n AS VARCHAR(12)) AS nString
) n
CROSS APPLY (
SELECT SUBSTRING(nString, 1, x.x) AS actual
, x.x
FROM (
VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
) x(x)
) z
) x
WHERE x.sort = 1
What i do is take x number of characters from a string, and then trying to replicate these characters so they fill the string and trying to compare to the initial string. If it matches, it means string consist of repeating characters.
Finally, i sort them so we get most desired result, sorted by:
- repeated string being matched first
- but not non-repeating strings
- take the longest repeating string, this fixes problem with 11111111 strings, which technically repeat by "1", "11" and "1111"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论