How to replace all instances of text between two characters in a huge JSON string using T-SQL in an Azure SQL database?

huangapple go评论52阅读模式
英文:

How to replace all instances of text between two characters in a huge JSON string using T-SQL in an Azure SQL database?

问题

如何在T-SQL中多次替换一个大字符串(50,000个字符)中两个字符之间的文本?

我有一个来自Azure管道的JSON输出,我们将其放入一个表中的单个单元格中,以便存储过程解析和处理数据。但是,JSON中的一行如下所示:

"body":"bxncucbuc "what is happening" cbibciuc",
"subject:"whatever"等等。 .........  
"body":"bxncucbuc "what more things are happening" cbibciuc",
"subject:"whatever"等等。 .........
"body":"bxncucbuc "what else is happening" cbibciuc",
"subject:"whatever"等等。

openjson无法解析脚本,因为"what is happening"(或类似的内容)看起来像另一列,因为它包含" '。因此,我希望替换"body":"和"subject":之间的所有内容,因为实际上我不需要这些数据。

我可以使用以下代码替换单个实例:

select   
    stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
from 
    stage.JsonResponse

然而,下面的代码无限运行:

while (select CHARINDEX('"body":"<', jsonresponse) from stage.JsonResponse) != 0
begin 
    update stage.JsonResponse 
    set JsonResponse =  stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
    from stage.Mimecast_JsonResponse

    if (select CHARINDEX('"body":"<', jsonresponse) 
        from stage.JsonResponse) = 0
        BREAK
    else
        CONTINUE
END  

我已经更改为查找"body":"<,以防它意外更改一些脚本,然后再寻找自己,并再次更改等等。

同样,我尝试了一个公共表达式(CTE):

;WITH Cte(jsonresponse) AS 
(
    SELECT jsonresponse FROM #a
    UNION ALL
    SELECT STUFF(jsonresponse, CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse), CHARINDEX(&#39;&quot;subject&quot;:&quot;&#39;, jsonresponse) - CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse) + 1, &#39;&quot;body&quot;:&quot;replaced with a random piece of text&quot;,&quot;&#39;)
    FROM Cte
    WHERE CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) != 0
)
SELECT jsonresponse 
FROM Cte
WHERE CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) = 0
OPTION (MAXRECURSION 0);

但这似乎也在不断运行,实际上只更改了第一个实例。

我的JSON长度为50,000个字符,需要替换其中的所有实例,这就是我遇到困难的地方,如何更改每个问题脚本的每个实例。

任何帮助将不胜感激。

我在Azure SQL数据库上使用T-SQL。

英文:

How to replace text between 2 characters MULTIPLE TIMES in a huge string (50,000 characters) in T-SQL?

I have a JSON output from my Azure Pipeline which we drop into a single cell in a table ready for the stored procedure to parse and process the data. However, one of the lines in the JSON looks like this

&quot;body&quot;:&quot;bxncucbuc &quot;what is happening&quot; cbibciuc&quot;,
&quot;subject:&quot;whatever&quot; etc etc. .........  
&quot;body&quot;:&quot;bxncucbuc &quot;what more things are happening&quot; cbibciuc&quot;,
&quot;subject:&quot;whatever&quot; etc etc. .........
&quot;body&quot;:&quot;bxncucbuc &quot;what else is happening&quot; cbibciuc&quot;,
&quot;subject:&quot;whatever&quot; etc etc. 

The openjson cannot parse the script as the "what is happening" (or similar) looks like another column as it has the " 's in. So, I am looking to replace everything BETWEEN "body":" and "subject":, as actually I don't need this data anyway.

I can get a single instance replaced using

select   
    stuff(jsonresponse, CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse), CHARINDEX(&#39;&quot;subject&quot;:&#39;, jsonresponse) - CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse) + 1, &#39;&quot;bodyX&quot;:&quot;replaced with a random piece of text&quot;,&quot;&#39;)
from 
    stage.JsonResponse

However

while (select CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) from stage.JsonResponse) != 0
begin 
    update stage.JsonResponse 
    set JsonResponse =  stuff(jsonresponse, CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse), CHARINDEX(&#39;&quot;subject&quot;:&#39;, jsonresponse) - CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse) + 1, &#39;&quot;bodyX&quot;:&quot;replaced with a random piece of text&quot;,&quot;&#39;)
    from stage.Mimecast_JsonResponse

    if (select CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) 
        from stage.JsonResponse) = 0
        BREAK
    else
        CONTINUE
END  

just runs forever. I have changed it to look for "body":"< in case it was inadvertently changing some script and then looking for itself, and changing that again etc.

Similarly, I have tried a CTE:

;WITH Cte(jsonresponse) AS 
(
    SELECT jsonresponse FROM #a
    UNION ALL
    SELECT STUFF(jsonresponse, CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse), CHARINDEX(&#39;&quot;subject&quot;:&quot;&#39;, jsonresponse) - CHARINDEX(&#39;&quot;body&quot;:&quot;&#39;, jsonresponse) + 1, &#39;&quot;body&quot;:&quot;replaced with a random piece of text&quot;,&quot;&#39;)
    FROM Cte
    WHERE CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) != 0
)
SELECT jsonresponse 
FROM Cte
WHERE CHARINDEX(&#39;&quot;body&quot;:&quot;&lt;&#39;, jsonresponse) = 0
OPTION (MAXRECURSION 0); `

but this too seems to run and run, but actually only changes the first instance.

My JSON is 50,000 characters long, and there are multiple instances in the json that I need to replace ALL OF THEM, and this is where I have struggled, how to change EVERY INSTANCE of the offending script.

Any help will be appreciated.

I am using T-SQL on an Azure SQL database.

答案1

得分: 2

使用递归公共表达式(CTE)的选项,我认为这应该可以工作 - 我进行了一些实验,似乎(可能是因为数据如何在内部被修改、引用或传递)最可靠的方法是影响字符串的长度,而只是在原地替换字符。

以下示例可用于查找每个 body/subject 元素之间的部分,并将内容替换为一个重复的单一字符,而不改变长度。

由于您不需要此内容,您可以使用 replace 完全删除它。

这对您的情况有帮助吗?

演示 DBFiddle

with p as (
  select Id, jsonCol,
    CharIndex('"body"', jsonCol) as Bpos,
    CharIndex('"subject"', jsonCol) as Spos
  from t
  union all
  select Id, j,
    CharIndex('"body"', j, Spos) as Bpos,
    CharIndex('"subject"', j, Spos + 1) as Spos
  from p
  cross apply(values( Stuff(jsoncol, bpos + 8, spos - bpos - 10, Replicate('#', spos - bpos - 10))))n(j)
  where Bpos > 0 
)
select Replace(jsoncol, '#','');
from p
where bpos = 0;
英文:

Looking at the option of using a recursive CTE, I believe this should work - I did some experimenting and it appears (probably due to how the data gets amended and referenced or passed around internally) the most reliable way is to not affect the length of the string but just to replace characters in-situ.

The following example works to find the sections between each body/subject element and replaces the content with a single repeating character without changing the length.

Since you don't need this content anyway you can then use replace to remove it completely.

Does this help for your scenario?

Demo DBFiddle

with p as (
  select Id, jsonCol,
    CharIndex(&#39;&quot;body&quot;&#39;, jsonCol) as Bpos,
    CharIndex(&#39;&quot;subject&quot;&#39;, jsonCol) as Spos
  from t
  union all
  select Id, j,
    CharIndex(&#39;&quot;body&quot;&#39;, j, Spos) as Bpos,
    CharIndex(&#39;&quot;subject&quot;&#39;, j, Spos + 1) as Spos
  from p
  cross apply(values( Stuff(jsoncol, bpos + 8, spos - bpos - 10, Replicate(&#39;#&#39;, spos - bpos - 10))))n(j)
  where Bpos &gt; 0 
)
select Replace(jsoncol, &#39;#&#39;,&#39;&#39;)
from p
where bpos = 0;

huangapple
  • 本文由 发表于 2023年5月29日 11:50:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76354576.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定