如何查找以”INPUT”开头并包含X个字符实例的Postgres列中的值?

huangapple go评论88阅读模式
英文:

How to look for values in a Postgres column that begin with INPUT and contain X instances of a character?

问题

以下是您要翻译的内容:

我有一个Postgres表中的列,其值具有以下模式。

INPUT-X

INPUT-X-Y

INPUT-X-Y-Z

INPUT-X-Y-Z-A-...

我正在尝试计算COLUMN中以INPUT开头并包含给定数量连字符的条目数。因此,如果输入值是2023-08-04T09:00:32.822+00:00,我想计算以2023-08-04T09:00:32.822+00:00开头并包含3个连字符的所有条目的数量。如果COLUMN中的值包括:

2023-08-04T09:00:32.822+00:00-1 
2023-08-04T09:00:32.822+00:00-1-1 
2023-08-04T09:00:32.822+00:00-2 
2023-08-04T09:00:32.822+00:00-2-1 
2023-08-04T09:00:32.822+00:00-3
2023-08-04T09:00:32.822+00:00-4

这个操作的计数应该等于4。(这是为了计算系列下一个值的索引,本例中将是2023-08-04T09:00:32.822+00:00-5。每个值都是对父条目的引用,例如2023-08-04T09:00:32.822+00:00-5表示对条目2023-08-04T09:00:32.822+00:00的第5个回复。同样,2023-08-04T09:00:32.822+00:00-2-3表示对条目2023-08-04T09:00:32.822+00:00-2的第3个回复。)

我目前正在使用的代码是:

select count(*) from messages_dev where ref like input || '%-' || repeat('-', hyphens) || '%' into result;

但是对于应该有更高数量的计数,它返回0。

有人能推荐一种在Postgres中获取这个计数的方法吗?

英文:

I have a COLUMN in a Postgres table which has values with the following pattern.

INPUT-X

INPUT-X-Y

INPUT-X-Y-Z

INPUT-X-Y-Z-A-...

I am trying to count the number of entries in COLUMN where the value begins with INPUT and contains a given number of hyphens. So if the input value is 2023-08-04T09:00:32.822+00:00, I want to count all of the entries in COLUMN which begin with 2023-08-04T09:00:32.822+00:00 and contain 3 hyphens. If values in COLUMN include:

2023-08-04T09:00:32.822+00:00-1 
2023-08-04T09:00:32.822+00:00-1-1 
2023-08-04T09:00:32.822+00:00-2 
2023-08-04T09:00:32.822+00:00-2-1 
2023-08-04T09:00:32.822+00:00-3
2023-08-04T09:00:32.822+00:00-4

The count of this operation should equal 4. (The purpose of this is to calculate the index for the next value of the series, which in this case would be 2023-08-04T09:00:32.822+00:00-5. Each of the values are a reference to a parent entry e.g. 2023-08-04T09:00:32.822+00:00-5 signifies the 5th reply to entry 2023-08-04T09:00:32.822+00:00. Likewise, 2023-08-04T09:00:32.822+00:00-2-3 signifies the 3rd reply to entry 2023-08-04T09:00:32.822+00:00-2.)

The code I'm currently using is:

select count(*) from messages_dev where ref like input || '%-' || repeat('-', hyphens) || '%' into result;

but that is returning 0, for counts that should have higher numbers.

Can anybody recommend a way to get this count in Postgres?

答案1

得分: 1

这是一个很好的问题。

你的方法似乎朝着正确的方向前进,但由于查询中存在一些错误,如input || ''%-'' || repeat(''-'', hyphens) || ''%'',它还存在一些问题。让我们先处理一下这些小的模式错误,然后再讨论查询本身。

  1. 存在一个问题是repeat(''-'', hyphens)模式。这将意味着当hyphens = 3时会变成---这样。请注意,我们需要在-之后也有整数。而且,模式'%-也没有包含在计数中。将模式更改为input || repeat('-%')将使我们更接近解决方案。
  2. 查询没有计算input字符串中-的数量,因为你将输入与重复的-模式连接在一起。如果input字符串总是2023-08-04T09:00:32.822+00:00,将其更改为repeat('-%', hyphens-2)将使我们更接近答案,但这里还有另一个问题 -
    %符号代表任何字符序列(包括零个字符)。它可以匹配字符串的任何子串。
    因此,查询'2023-08-04T09:00:32.822+00:00' || repeat('-%', 1)将与所有6个值匹配。

解决方案

  1. 为了解决所有这些问题,我建议使用其他字符串函数 - 即LENGTHREPLACE函数。
  2. 我们可以找到整个字符串的长度(让我们称之为l1),并找到一个新字符串的长度 - 该字符串已删除了所有的连字符(l2)。将l2减去l1,将给出字符串中-的数量。
  3. 我们可以通过用空字符''替换-来删除字符串中的连字符。

总的来说,查询将看起来像这样:

SELECT COUNT(*) AS result
FROM messages_dev
WHERE ref LIKE  input || ''-%'' 
  AND LENGTH(ref) - LENGTH(REPLACE(ref, ''-'', ''')) = hyphens; 

你可以在这里尝试这个查询。

英文:

This is a good question.

Your approach seems to be heading in the right direction, but it falls short due to a few errors in the query input || '%-' || repeat('-', hyphens) || '%'. Let us address the small pattern errors and then move onto the query in itself.

  1. There is an issue with the repeat('-', hyphens) pattern. This would mean --- like this when hyphens = 3. Note that we need to have integers after a - as well. Also, having the pattern '%- is not included in the count as well. Changing the pattern to input || repeat('-%') brings us 1 step closer to the solution.
  2. The query does not count the number of - in the input string, because you have input concatenated with a pattern of - repeated hyphen times. In case the input string is always 2023-08-04T09:00:32.822+00:00, changing it to repeat('-%', hyphens-2) would bring us closer to the answer but there is another problem here -
    % symbol represents any sequence of characters (including zero characters). It can match any substring of a string.
    So, the query '2023-08-04T09:00:32.822+00:00' || repeat('-%', 1) would match with all 6 values.

Solution:

  1. To mitigate all these issues, I would suggest using other string functions - namely the LENGTH and REPLACE Functions.
  2. We can find the length of the entire string (let us call this l1) and find the length of a new string - which has all its hyphens removed (l2). Substracting l2 from l1, would give us the number of - in the string.
  3. We can remove the hyphens in the string by replacing the - with a null character ''.

Overall, the query would look something like this:

SELECT COUNT(*) AS result
FROM messages_dev
WHERE ref LIKE  input || '-%' 
  AND LENGTH(ref) - LENGTH(REPLACE(ref, '-', '')) = hyphens; 

You could try this out here

答案2

得分: 0

以下是您要翻译的内容:

"由于您有一个字符串,您可以使用字符串函数来获取下一个数字。

但最好不要以这种方式存储它,因为字符串函数通常很慢,规范化的表结构更适合关系型数据库。

如果您只想要特定的日期,您需要添加您实际想要的日期。

当您想在并行系统中使用它时,您应该了解并发性。

CREATE TABLE mytable
    ("mycolumn" varchar(33))
;
    
INSERT INTO mytable
    ("mycolumn")
VALUES
    ('2023-08-04T09:00:32.822+00:00-1'),
    ('2023-08-04T09:00:32.822+00:00-1-1'),
    ('2023-08-04T09:00:32.822+00:00-2'),
    ('2023-08-04T09:00:32.822+00:00-2-1'),
    ('2023-08-04T09:00:32.822+00:00-3'),
    ('2023-08-04T09:00:32.822+00:00-4')
;

WITH CTE AS (select Substring(mycolumn,0,30) datetxt,MAX(split_part(mycolumn, '-', 4))::int +  1 newnum
FROM mytable
GROUP BY 1)
SELECT CONCAT(datetxt,'-', newnum::text)
  FROM CTE
```"


<details>
<summary>英文:</summary>

As you have a string, you can use string functions, to get the next number.

but betst would be not to store it that way, as string functions are usually ver slow, and a normalized table structure suits better for relatinal databases

This query would find all sequences and get the higest number, if you want only specific ones, you need to add which dates you aczaully want.

when you want to use it in parallel systems, you should read about concrency.

CREATE TABLE mytable
("mycolumn" varchar(33))
;

INSERT INTO mytable
("mycolumn")
VALUES
('2023-08-04T09:00:32.822+00:00-1'),
('2023-08-04T09:00:32.822+00:00-1-1'),
('2023-08-04T09:00:32.822+00:00-2'),
('2023-08-04T09:00:32.822+00:00-2-1'),
('2023-08-04T09:00:32.822+00:00-3'),
('2023-08-04T09:00:32.822+00:00-4')
;



WITH CTE AS (select Substring(mycolumn,0,30) datetxt,MAX(split_part(mycolumn, '-', 4))::int + 1 newnum
FROM mytable
GROUP BY 1)
SELECT CONCAT(datetxt,'-', newnum::text)
FROM CTE

| concat |
|:-------|
| 2023-08-04T09:00:32.822+00:00-5 |


[fiddle](https://dbfiddle.uk/hHuWyzMR)


</details>



huangapple
  • 本文由 发表于 2023年8月4日 23:23:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76837280.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定