正则表达式以忽略数据的某些部分中的分隔符

huangapple go评论64阅读模式
英文:

Regex to ignore delimiter for certain parts of the data

问题

以下是已翻译的内容:

当前的数据和使用的正则表达式

考虑以下数据;

FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/

目前用于分隔数据的方法是使用分隔符'/'来获取此数据列表;

{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }

这是带有使用的正则表达式的代码段;

REGEXP_SUBSTR( ORGDATA ,'(.*?)/|$', 1, COLUMN_VALUE, NULL, 1 ) AS DATALABEL


问题

现在考虑这个数据

FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/

第二个数据包含分隔符'/',导致以下数据列表

{"FDTYP", "FD2323*", "*", "*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }

而正确的数据应该是

{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }

现在我知道修复这个问题最简单的方法是更改使用的分隔符,但不幸的是,由于遗留数据的原因,这是不可能的。

那么是否有办法创建一个正则表达式来忽略第二个数据中的分隔符?

我尝试了以下正则表达式

(?<=FDTYP\/)(.*?)(?=\/FDPERIOD)

它确实获取了第二个数据,但对于其他数据不起作用。

英文:

Current data and regex used

Consider the following data;

FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/

The current way used to separate the data is by using delimiter '/' to get this list of data;

{&quot;FDTYP&quot;, &quot;TESTTYPE&quot;, &quot;FDPERIOD&quot;, &quot;31 Day(s)&quot;, &quot;MAT_INST&quot;, &quot;CREDIT_TO_ACC&quot;, &quot;DEPTACCT&quot;, &quot;0016&quot; }

and this is the code snippet with the regex used;

REGEXP_SUBSTR( ORGDATA ,&#39;(.*?)/|$&#39;, 1, COLUMN_VALUE, NULL, 1 ) AS DATALABEL


The problem

Now consider this data

FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/

The second data contains the delimiter '/' which produces the following list of data

{&quot;FDTYP&quot;, &quot;FD2323*&quot;, &quot;*&quot;, &quot;*-&quot;, &quot;FDPERIOD&quot;, &quot;31 Day(s)&quot;, &quot;MAT_INST&quot;, &quot;CREDIT_TO_ACC&quot;, &quot;DEPTACCT&quot;, &quot;0016&quot; }

when the correct data should be

{&quot;FDTYP&quot;, &quot;FD2323*/*/*-&quot;, &quot;FDPERIOD&quot;, &quot;31 Day(s)&quot;, &quot;MAT_INST&quot;, &quot;CREDIT_TO_ACC&quot;, &quot;DEPTACCT&quot;, &quot;0016&quot; }

Now I know that the easiest way to fix this is by changing the delimiter used but unfortunately that is not possible because of legacy datas.

So is there any way to create a regex that ignore the delimiter for the second data?

I have tried the following regex

(?&lt;=FDTYP\/)(.*?)(?=\/FDPERIOD)

and it does get me the second data but it does not work for the rest of the data.

答案1

得分: 1

我不完全清楚你正在使用的SQL版本或者你是如何获取JSON的,但是假设问题总是出现在FDTYPE//FDPERIOD之间,我认为像这样的处理应该有效:

  1. 仅替换子字符串中的/实例,用你知道不会在数据集中出现的某个字符或字符组合来替换它。在我的示例中,我使用{!}作为替代。

  2. 执行你正常的例程。

  3. 回到提取的数据上,将你的替代{!}转换回/

SELECT
    'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' AS ORGDATA,
    CONCAT('FDTYP/', REPLACE(REGEXP_SUBSTR((SELECT ORGDATA), '(?<=FDTYP/)(.*)(?=/FDPERIOD)'), '/', '{!}'), REGEXP_SUBSTR((SELECT ORGDATA), '/FDPERIOD.*(?=/$)')) AS transformed,
    CONCAT('{"', REPLACE((SELECT transformed), '/', '","'), '"}') AS object_str,
    REPLACE((SELECT object_str), '{!}', '/') AS untransformed

这将得到:

FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
FDTYP/FD2323*{!}*{!}*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016
{"FDTYP","FD2323*{!}*{!}*-","FDPERIOD","31 Day(s)","MAT_INST","CREDIT_TO_ACC","DEPTACCT","0016"}
{"FDTYP","FD2323*/*/*-","FDPERIOD","31 Day(s)","MAT_INST","CREDIT_TO_ACC","DEPTACCT","0016"}

https://www.db-fiddle.com/f/4BRg1FG23qhYYFrwFAqqEY/0

英文:

I'm not 100% clear what version of SQL you are using or how you're actually getting that JSON, but assuming the problem spot always occurs between FDTYPE/ and /FDPERIOD, I think something like this should work:

  1. Replace only the instances of / in the substring with some character or character combo that you know won't occur in your dataset. In my example I'm using {!} as the replacement.

  2. Do you normal routine

  3. Go back over extracted data and convert your replacement {!} back to /.

SELECT
	&#39;FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/&#39; AS ORGDATA,
	CONCAT(&#39;FDTYP/&#39;, REPLACE(REGEXP_SUBSTR((SELECT ORGDATA), &#39;(?&lt;=FDTYP/)(.*)(?=/FDPERIOD)&#39;), &#39;/&#39;, &#39;{!}&#39;), REGEXP_SUBSTR((SELECT ORGDATA), &#39;/FDPERIOD.*(?=/$)&#39;)) AS transformed,
	CONCAT(&#39;{&quot;&#39;, REPLACE((SELECT transformed), &#39;/&#39;, &#39;&quot;, &quot;&#39;), &#39;&quot;}&#39;) AS object_str,
	REPLACE((SELECT object_str), &#39;{!}&#39;, &#39;/&#39;) AS untransformed

That gets you:

FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
FDTYP/FD2323*{!}*{!}*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016
{&quot;FDTYP&quot;, &quot;FD2323*{!}*{!}*-&quot;, &quot;FDPERIOD&quot;, &quot;31 Day(s)&quot;, &quot;MAT_INST&quot;, &quot;CREDIT_TO_ACC&quot;, &quot;DEPTACCT&quot;, &quot;0016&quot;}
{&quot;FDTYP&quot;, &quot;FD2323*/*/*-&quot;, &quot;FDPERIOD&quot;, &quot;31 Day(s)&quot;, &quot;MAT_INST&quot;, &quot;CREDIT_TO_ACC&quot;, &quot;DEPTACCT&quot;, &quot;0016&quot;}

https://www.db-fiddle.com/f/4BRg1FG23qhYYFrwFAqqEY/0

答案2

得分: 1

在Oracle中,您可以使用正则表达式'^(^/]+/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$'并提取捕获组:

SELECT REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         1
       ) AS col1,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         2
       ) AS col2,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         3
       ) AS col3,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         4
       ) AS col4,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         5
       ) AS col5,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         6
       ) AS col6,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         7
       ) AS col7,
       REGEXP_SUBSTR(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         1,
         1,
         NULL,
         8
       ) AS col8
FROM   table_name

对于样本数据:

CREATE TABLE table_name (value) AS
  SELECT 'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL UNION ALL
  SELECT 'FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL

输出如下:

COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
FDTYP FD2323*/*/*- FDPERIOD 31 Day(s) MAT_INST CREDIT_TO_ACC DEPTACCT 0016
FDTYP TESTTYPE FDPERIOD 31 Day(s) MAT_INST CREDIT_TO_ACC DEPTACCT 0016

或者,要以单个格式化字符串输出:

SELECT REGEXP_REPLACE(
         value,
         '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
         '{"", "", "", "", "", "", "", ""}'
       ) AS data
FROM   table_name;

输出如下:

DATA
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}

fiddle

英文:

In Oracle, you can use the regular expression &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39; and extract the capturing groups:

SELECT REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         1
       ) AS col1,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         2
       ) AS col2,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         3
       ) AS col3,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         4
       ) AS col4,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         5
       ) AS col5,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         6
       ) AS col6,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         7
       ) AS col7,
       REGEXP_SUBSTR(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         1,
         1,
         NULL,
         8
       ) AS col8
FROM   table_name

Which, for the sample data:

CREATE TABLE table_name (value) AS
  SELECT &#39;FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/&#39; FROM DUAL UNION ALL
  SELECT &#39;FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/&#39; FROM DUAL

Outputs:

COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
FDTYP FD2323*/*/*- FDPERIOD 31 Day(s) MAT_INST CREDIT_TO_ACC DEPTACCT 0016
FDTYP TESTTYPE FDPERIOD 31 Day(s) MAT_INST CREDIT_TO_ACC DEPTACCT 0016

Or, to get the output as a single formatted string:

SELECT REGEXP_REPLACE(
         value,
         &#39;^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$&#39;,
         &#39;{&quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;, &quot;&quot;}&#39;
       ) AS data
FROM   table_name;

Which outputs:

DATA
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}

fiddle

答案3

得分: 0

您可以使用以下。

```java
(?<!\*)/

输出

FDTYP
TESTTYPE
FDPERIOD
31 天
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016

FDTYP
FD2323*/*/*-
FDPERIOD
31 天
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016

<details>
<summary>英文:</summary>

You can use the following.

```java
(?&lt;!\*)/

Output

FDTYP
TESTTYPE
FDPERIOD
31 Day(s)
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016


FDTYP
FD2323*/*/*-
FDPERIOD
31 Day(s)
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016

huangapple
  • 本文由 发表于 2023年6月12日 11:46:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76453537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定