从MySQL表的每一行返回所有正则表达式匹配项。

huangapple go评论85阅读模式
英文:

return all regex match from each row of a table in mysql

问题

I have a table named 'Articles' that has some rows, inside each row has some image tags that I want to return all of those.

I used this query but it only returns one of those tags per row. And other lines are returned empty.

SELECT REGEXP_SUBSTR(body, '([0-9])+\.(jpg|png|jpeg)') from articles;

Article 1:
从MySQL表的每一行返回所有正则表达式匹配项。
从MySQL表的每一行返回所有正则表达式匹配项。
从MySQL表的每一行返回所有正则表达式匹配项。

Article 2:
从MySQL表的每一行返回所有正则表达式匹配项。

Article 3:
...

I want all of these images.

Thank you for your help.

UPDATE
Version 10.4.19-MariaDB

英文:

I have a table named 'Articles' that has some rows, inside each row has some image tag that I want to return all of those.

I used this query but it only returns one of those tags per row. And other lines are returned empty.

SELECT REGEXP_SUBSTR(body, '([0-9])+\.(jpg|png|gpeg)') from articles;

从MySQL表的每一行返回所有正则表达式匹配项。

Article 1:
    <img src"54545343.png" />
    <img src"24352445.png" />
    <img src"24352435.png" />

article 2: 
<img src"24352435.png" />

article 3:
...

I want all of these images.

thank you for your help

UPDATE
version 10.4.19-MariaDB

答案1

得分: 1

你可以使用以下这种_“不太优雅的解决方案”_,但效率非常低下:

SELECT a.id, REGEXP_SUBSTR(body, '([0-9])+\\.(jpg|png|gpeg)', 1, t.n) AS img
FROM articles a
CROSS JOIN (
    VALUES
        ROW(1), ROW(2), ROW(3), ROW(4), ROW(5),
        ROW(6), ROW(7), ROW(8), ROW(9), ROW(10)
) AS t(n)
HAVING img IS NOT NULL
ORDER BY id, img;

REGEXP_SUBSTR的第四个参数是:

> occurrence: 要搜索的匹配项的发生次数。如果省略,默认为1。

上面的查询尝试检索第1到第10个匹配项。

这可能是一个很好的例子:

> “你可以做并不意味着你应该这样做。”

但你可以使用存储函数来提取图像src属性的内容:

DELIMITER //
CREATE FUNCTION FIND_IMG_SRC(str text, occurrence int)
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
    DECLARE loc INT DEFAULT 1;
    DECLARE src_loc INT;
    DECLARE i INT DEFAULT 0;
    DECLARE img_tag text DEFAULT '';
    WHILE(i < occurrence AND loc IS NOT NULL) DO
        SET loc = NULLIF(LOCATE('<img', str, loc + 1), 0);
        SET i = i + 1;
    END WHILE;
    
    IF loc IS NULL THEN
        RETURN NULL;
    ELSE
        # 获取完整的img标签
        SET img_tag = SUBSTR(str, loc, LOCATE('>', str, loc + 1) - loc + 1);

        # 检查img标签是否包含src属性
        SET src_loc = LOCATE('src="', img_tag);
        IF src_loc = 0 THEN
            RETURN NULL;
        ELSE
            # 返回src属性的内容
            RETURN SUBSTRING_INDEX(SUBSTR(img_tag, src_loc + 5), '"', 1);
        END IF;
    END IF;
END//
DELIMITER ;

然后可以使用以下查询:

SELECT a.id, FIND_IMG_SRC(body, t.n) AS img
from articles a
CROSS JOIN (
    SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
    SELECT 6      UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
) AS t
HAVING img IS NOT NULL
ORDER BY id, img;

这里有一个db<>fiddle

注意:上面的存储函数期望 src="...",如果等号两边有单引号或空格,它将无法工作。如果<img>标签内部有>,它也会失败。

英文:

You could use a "dirty solution" like this, but it is horribly inefficient:

SELECT a.id, REGEXP_SUBSTR(body, &#39;([0-9])+\.(jpg|png|gpeg)&#39;, 1, t.n) AS img
FROM articles a
CROSS JOIN (
    VALUES
        ROW(1), ROW(2), ROW(3), ROW(4), ROW(5),
        ROW(6), ROW(7), ROW(8), ROW(9), ROW(10)
) AS t(n)
HAVING img IS NOT NULL
ORDER BY id, img;

The fourth parameter to REGEXP_SUBSTR is:

> occurrence: Which occurrence of a match to search for. If omitted, the default is 1.

The above query tries to retrieve occurrences 1 - 10.


This is probably a good example of

> "just because you can, does not mean you should"

but you could use a stored function to extract the content of the image src attributes:

DELIMITER //
CREATE FUNCTION FIND_IMG_SRC(str text, occurrence int)
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
    DECLARE loc INT DEFAULT 1;
    DECLARE src_loc INT;
    DECLARE i INT DEFAULT 0;
    DECLARE img_tag text DEFAULT &#39;&#39;;
    WHILE(i &lt; occurrence AND loc IS NOT NULL) DO
        SET loc = NULLIF(LOCATE(&#39;&lt;img&#39;, str, loc + 1), 0);
        SET i = i + 1;
    END WHILE;
    
    IF loc IS NULL THEN
        RETURN NULL;
    ELSE
        # Get complete img tag
        SET img_tag = SUBSTR(str, loc, LOCATE(&#39;&gt;&#39;, str, loc + 1) - loc + 1);

        # Check that img tag contains src attribute
        SET src_loc = LOCATE(&#39;src=&quot;&#39;, img_tag);
        IF src_loc = 0 THEN
            RETURN NULL;
        ELSE
            # Return content of src attribute
            RETURN SUBSTRING_INDEX(SUBSTR(img_tag, src_loc + 5), &#39;&quot;&#39;, 1);
        END IF;
    END IF;
END//
DELIMITER ;

And then the following query:

SELECT a.id, FIND_IMG_SRC(body, t.n) AS img
from articles a
CROSS JOIN (
    SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
    SELECT 6      UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
) AS t
HAVING img IS NOT NULL
ORDER BY id, img;

And here's a db<>fiddle.

Note: the above stored function expects src=&quot;...&quot; and will not work with single quotes or spaces either side of the =. It will also fail if there is a &gt; anywhere inside the &lt;img&gt;.

huangapple
  • 本文由 发表于 2023年6月12日 19:03:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456021.html
  • mariadb-10.4
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定