在SQL Server中识别重复的数字

huangapple go评论74阅读模式
英文:

Recognise repeating number in SQL Server

问题

我有一个 SQL Server 列,其类型为 int,用于存储测试分数,范围从 10 到 550,但某些测试分数不在此范围内,例如 521521。

我想要提取数字中重复的部分并保存它。在 SQL 中如何实现这一点?

  • 521521 应变成 521
  • 919191 应变成 91
  • 7676 应变成 76

我不想简单地取左边的(最多)三位或字符串的一半,因为我还有数据,例如 123456,这当然应该保持不变,以便作为不可用数据。但是,我愿意冒险处理类似 123123 这样的情况。

因此,我真的在寻找一种识别重复数字/字母的方法。

英文:

I have a SQL Server column of type int that stores testscores from 10 to 550, but some testscores do not fit the range, e.g. 521521.

What I want is to take the repeating part of the number and save that. How can I do this in SQL ?

  • 521521 should become 521
  • 919191 should become 91
  • 7676 should become 76

I do not want to simply take the left (max) three spots or half of the string, as I also have data that says 123456, which of course should remain clear as unusable data. However, I am willing to take the risk for cases like 123123.

Therefore, I am really looking for something to recognise repeating numbers/letters.

答案1

得分: 3

你不需要在这里进行字符串操作。您可以使用一些模除和整数除法以及大的 CASE 来完成它。

SELECT *,
  Corrected = CASE
    WHEN col / 100    % 100    = col % 100    AND col / 10000       IN (0, col % 100   )
      THEN col % 100
    WHEN col / 1000   % 1000   = col % 1000   AND col / 1000000     IN (0, col % 1000  )
      THEN col % 1000
    WHEN col / 10000  % 10000  = col % 10000  AND col / 100000000   IN (0, col % 10000 )
      THEN col % 10000
    WHEN col / 100000 % 100000 = col % 100000 AND col / CAST(10000000000 AS bigint) IN (0, col % 100000)
      THEN col % 100000
    ELSE col
  END
FROM @Demo d;

db<>fiddle

英文:

You don't need string manipulation here. You can do it with some modulo and integer division, and a big CASE

SELECT *,
  Corrected = CASE
    WHEN col / 100    % 100    = col % 100    AND col / 10000       IN (0, col % 100   )
      THEN col % 100
    WHEN col / 1000   % 1000   = col % 1000   AND col / 1000000     IN (0, col % 1000  )
      THEN col % 1000
    WHEN col / 10000  % 10000  = col % 10000  AND col / 100000000   IN (0, col % 10000 )
      THEN col % 10000
    WHEN col / 100000 % 100000 = col % 100000 AND col / CAST(10000000000 AS bigint) IN (0, col % 100000)
      THEN col % 100000
    ELSE col
  END
FROM @Demo d;

db<>fiddle

答案2

得分: 2

以下是翻译的内容:

好吧,字符串操作不是 T-SQL 的强项,但我很好奇纯 T-SQL 解决方案有多难找。我的解决方案相当繁琐,但对于你描述的所有情况以及我根据你的问题想出的一些情况都有效。 (顺便说一下,关于 [Thom A 的评论][1](从 111111 预期的结果),它返回 111)

首先,创建并填充示例表格(请在以后的问题中**保存**这一步):

DECLARE @Demo AS TABLE
(
col int,
expected int
);

INSERT INTO @Demo (col, expected) VALUES
(521521, 521), -- 每 3 位数字重复一次
(919191, 91), -- 每 2 位数字重复一次,3 次
(919291, 919291), -- 不重复 - 注意中间的 92
(7676, 76), -- 每 2 位数字重复一次
(123456, 123456), -- 不重复
(2147421474, 21474), -- 每 5 位数字重复一次
(2147483647, 2147483647), -- 不重复
(12341234, 1234), -- 每 4 位数字重复一次
(123123123,123), -- 每 3 位数字重复一次,3 次
(123456123,123456123); -- 不重复


注意注释 - 它们说明了预期结果的原因。

然后,查询部分:我不得不使用了很多 `CROSS APPLY` 和一个 `CASE` 表达式,但它确实提供了预期的结果:

SELECT col, expected,
CASE WHEN HasTwoParts = 1
AND LEFT(StringCol, NumberOfDigits/2) = RIGHT(StringCol, NumberOfDigits/2) THEN LEFT(StringCol, NumberOfDigits/2)
WHEN HasThreeParts = 1
AND LEFT(StringCol, NumberOfDigits/3) = RIGHT(StringCol, NumberOfDigits/3)
AND LEFT(StringCol, NumberOfDigits/3) = SUBSTRING(StringCol, (NumberOfDigits/3)+1, (NumberOfDigits/3)) THEN LEFT(StringCol, NumberOfDigits/3)
WHEN HasFiveParts = 1
AND LEFT(StringCol, NumberOfDigits/5) = RIGHT(StringCol, NumberOfDigits/5) THEN LEFT(StringCol, NumberOfDigits/5)
ELSE StringCol
END As Result
FROM @Demo
CROSS APPLY (SELECT CAST(col as varchar(12)) As StringCol) As AsString
CROSS APPLY (SELECT LEN(StringCol) As NumberOfDigits) As Length
CROSS APPLY (
SELECT IIF(NumberOfDigits % 2 = 0, 1, 0) As HasTwoParts,
IIF(NumberOfDigits % 3 = 0, 1, 0) As HasThreeParts,
IIF(NumberOfDigits % 5 = 0, 1, 0) As HasFiveParts) As LengthDividers;


结果如下:

col expected Result

521521 521 521
919191 91 91
919291 919291 919291
7676 76 76
123456 123456 123456
2147421474 21474 21474
2147483647 2147483647 2147483647
12341234 1234 1234
123123123 123 123
123456123 123456123 123456123


你可以在 [<kbd>db</kbd>fiddle</kbd>][2] 上看到实际演示。

  [1]: https://stackoverflow.com/questions/76704556/recognise-repeating-number-in-sql-server/76705167#comment135232454_76704556
  [2]: https://dbfiddle.uk/5mx09IVP
英文:

Well, string manipulation isn't T-SQL's strong suit, but I was curious as to how hard will it be to find a pure T-SQL solution for this.
My solution is quite cumbersome but it works for all the scenarios you've described and a few scenarios I came up with based on your question. (BTW, with regards to Thom A's comment (expected result from 111111), it returns 111)

first, create and populate sample table (Please save us this step in your future questions):

DECLARE @Demo AS TABLE 
(
  col int,
  expected int
);

INSERT INTO @Demo  (col, expected) VALUES
(521521, 521), -- repeat after 3 digits
(919191, 91), -- repeat after 2 digits, 3 times
(919291, 919291), -- no repeat - Note the 92 in the middle
(7676, 76), -- repeat after 2 digits
(123456, 123456), -- no repeat
(2147421474, 21474), -- repeat after 5 digits 
(2147483647, 2147483647), -- no repeat
(12341234, 1234), -- repeat after 4 digits
(123123123,123), -- repeat after 3 digits, 3 times
(123456123,123456123); -- no repeat

Note the comments - they specify why the expected result is what it is.

Then, the query: I've had to use quite a few CROSS APPLYs and a CASE expression, but it does provide the expected result:

SELECT col, expected,
       CASE WHEN HasTwoParts = 1 
				AND LEFT(StringCol, NumberOfDigits/2) = RIGHT(StringCol, NumberOfDigits/2) THEN LEFT(StringCol, NumberOfDigits/2)
            WHEN HasThreeParts = 1 
				AND LEFT(StringCol, NumberOfDigits/3) = RIGHT(StringCol, NumberOfDigits/3) 
				AND LEFT(StringCol, NumberOfDigits/3) = SUBSTRING(StringCol, (NumberOfDigits/3)+1, (NumberOfDigits/3)) THEN LEFT(StringCol, NumberOfDigits/3)
			WHEN HasFiveParts = 1 
				AND LEFT(StringCol, NumberOfDigits/5) = RIGHT(StringCol, NumberOfDigits/5) THEN LEFT(StringCol, NumberOfDigits/5)
            ELSE StringCol
       END As Result
FROM @Demo 
CROSS APPLY (SELECT CAST(col as varchar(12)) As StringCol) As AsString
CROSS APPLY (SELECT LEN(StringCol) As NumberOfDigits) As Length
CROSS APPLY (
	SELECT IIF(NumberOfDigits % 2 = 0, 1, 0) As HasTwoParts, 
		   IIF(NumberOfDigits % 3 = 0, 1, 0) As HasThreeParts,
		   IIF(NumberOfDigits % 5 = 0, 1, 0) As HasFiveParts) As LengthDividers;

And the results:

col			expected	Result
------------------------------
521521		521			521
919191		91			91
919291		919291		919291
7676		76			76
123456		123456		123456
2147421474	21474		21474
2147483647	2147483647	2147483647
12341234	1234		1234
123123123	123			123
123456123	123456123	123456123

You can see a live demo on <kbd>db<>fiddle</kbd>

答案3

得分: 1

这是另一个版本:

DECLARE @data TABLE (n int, expected int)

INSERT INTO @data
VALUES	(521521, 521)
,	(919191, 91)
,	(7676, 76)
,	(123456, 123456)
,	(123123, 123)
,	(1234567, 1234567)
,	(111111111, 111)
,	(1111111111, 11111)
,	(1000011111, 1000011111)

SELECT	n, expected, actual
FROM	(

		SELECT	n, expected, actual
		,	ROW_NUMBER() OVER(PARTITION BY n
			ORDER BY CASE WHEN REPLICATE(actual, LEN(nString) / z.x) = nString THEN 1 ELSE 0 END DESC
			,	CASE WHEN actual = nString THEN 0 ELSE 1 END DESC
			,	LEN(actual) DESC
		) sort
		FROM	@data d
		CROSS APPLY (
				SELECT	CAST(n AS VARCHAR(12)) AS nString
			) n
		CROSS APPLY (
				SELECT	SUBSTRING(nString, 1, x.x) AS actual
				,	x.x
				FROM	(
					VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
				) x(x)
			) z
		) x
WHERE	x.sort = 1

我所做的是从字符串中取x个字符,然后尝试复制这些字符以填充字符串,并尝试与初始字符串进行比较。如果匹配,这意味着字符串由重复的字符组成。

最后,我对它们进行排序,以获取最理想的结果,排序方式为:

  • 首先匹配重复的字符串
  • 但不匹配非重复的字符串
  • 取最长的重复字符串,这解决了11111111字符串的问题,从技术上讲,它们由“1”、“11”和“1111”重复。
英文:

Here's another version:

DECLARE @data TABLE (n int, expected int)

INSERT INTO @data
VALUES	(521521, 521)
,	(919191, 91)
,	(7676, 76)
,	(123456, 123456)
,	(123123, 123)
,	(1234567, 1234567)
,	(111111111, 111)
,	(1111111111, 11111)
,	(1000011111, 1000011111)

SELECT	n, expected, actual
FROM	(

		SELECT	n, expected, actual
		,	ROW_NUMBER() OVER(PARTITION BY n
			ORDER BY CASE WHEN REPLICATE(actual, LEN(nString) / z.x) = nString THEN 1 ELSE 0 END DESC
			,	CASE WHEN actual = nString THEN 0 ELSE 1 END DESC
			,	LEN(actual) DESC
		) sort
		FROM	@data d
		CROSS APPLY (
				SELECT	CAST(n AS VARCHAR(12)) AS nString
			) n
		CROSS APPLY (
				SELECT	SUBSTRING(nString, 1, x.x) AS actual
				,	x.x
				FROM	(
					VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
				) x(x)
			) z
		) x
WHERE	x.sort = 1

What i do is take x number of characters from a string, and then trying to replicate these characters so they fill the string and trying to compare to the initial string. If it matches, it means string consist of repeating characters.

Finally, i sort them so we get most desired result, sorted by:

  • repeated string being matched first
  • but not non-repeating strings
  • take the longest repeating string, this fixes problem with 11111111 strings, which technically repeat by "1", "11" and "1111"

huangapple
  • 本文由 发表于 2023年7月17日 20:41:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76704556.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定