英文:
Regex to ignore delimiter for certain parts of the data
问题
以下是已翻译的内容:
当前的数据和使用的正则表达式
考虑以下数据;
FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
目前用于分隔数据的方法是使用分隔符'/'来获取此数据列表;
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
这是带有使用的正则表达式的代码段;
REGEXP_SUBSTR( ORGDATA ,'(.*?)/|$', 1, COLUMN_VALUE, NULL, 1 ) AS DATALABEL
问题
现在考虑这个数据
FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
第二个数据包含分隔符'/',导致以下数据列表
{"FDTYP", "FD2323*", "*", "*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
而正确的数据应该是
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
现在我知道修复这个问题最简单的方法是更改使用的分隔符,但不幸的是,由于遗留数据的原因,这是不可能的。
那么是否有办法创建一个正则表达式来忽略第二个数据中的分隔符?
我尝试了以下正则表达式
(?<=FDTYP\/)(.*?)(?=\/FDPERIOD)
它确实获取了第二个数据,但对于其他数据不起作用。
英文:
Current data and regex used
Consider the following data;
FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
The current way used to separate the data is by using delimiter '/' to get this list of data;
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
and this is the code snippet with the regex used;
REGEXP_SUBSTR( ORGDATA ,'(.*?)/|$', 1, COLUMN_VALUE, NULL, 1 ) AS DATALABEL
The problem
Now consider this data
FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
The second data contains the delimiter '/' which produces the following list of data
{"FDTYP", "FD2323*", "*", "*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
when the correct data should be
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016" }
Now I know that the easiest way to fix this is by changing the delimiter used but unfortunately that is not possible because of legacy datas.
So is there any way to create a regex that ignore the delimiter for the second data?
I have tried the following regex
(?<=FDTYP\/)(.*?)(?=\/FDPERIOD)
and it does get me the second data but it does not work for the rest of the data.
答案1
得分: 1
我不完全清楚你正在使用的SQL版本或者你是如何获取JSON的,但是假设问题总是出现在FDTYPE/
和/FDPERIOD
之间,我认为像这样的处理应该有效:
-
仅替换子字符串中的
/
实例,用你知道不会在数据集中出现的某个字符或字符组合来替换它。在我的示例中,我使用{!}
作为替代。 -
执行你正常的例程。
-
回到提取的数据上,将你的替代
{!}
转换回/
。
SELECT
'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' AS ORGDATA,
CONCAT('FDTYP/', REPLACE(REGEXP_SUBSTR((SELECT ORGDATA), '(?<=FDTYP/)(.*)(?=/FDPERIOD)'), '/', '{!}'), REGEXP_SUBSTR((SELECT ORGDATA), '/FDPERIOD.*(?=/$)')) AS transformed,
CONCAT('{"', REPLACE((SELECT transformed), '/', '","'), '"}') AS object_str,
REPLACE((SELECT object_str), '{!}', '/') AS untransformed
这将得到:
FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
FDTYP/FD2323*{!}*{!}*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016
{"FDTYP","FD2323*{!}*{!}*-","FDPERIOD","31 Day(s)","MAT_INST","CREDIT_TO_ACC","DEPTACCT","0016"}
{"FDTYP","FD2323*/*/*-","FDPERIOD","31 Day(s)","MAT_INST","CREDIT_TO_ACC","DEPTACCT","0016"}
https://www.db-fiddle.com/f/4BRg1FG23qhYYFrwFAqqEY/0
英文:
I'm not 100% clear what version of SQL you are using or how you're actually getting that JSON, but assuming the problem spot always occurs between FDTYPE/
and /FDPERIOD
, I think something like this should work:
-
Replace only the instances of
/
in the substring with some character or character combo that you know won't occur in your dataset. In my example I'm using{!}
as the replacement. -
Do you normal routine
-
Go back over extracted data and convert your replacement
{!}
back to/
.
SELECT
'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' AS ORGDATA,
CONCAT('FDTYP/', REPLACE(REGEXP_SUBSTR((SELECT ORGDATA), '(?<=FDTYP/)(.*)(?=/FDPERIOD)'), '/', '{!}'), REGEXP_SUBSTR((SELECT ORGDATA), '/FDPERIOD.*(?=/$)')) AS transformed,
CONCAT('{"', REPLACE((SELECT transformed), '/', '", "'), '"}') AS object_str,
REPLACE((SELECT object_str), '{!}', '/') AS untransformed
That gets you:
FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/
FDTYP/FD2323*{!}*{!}*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016
{"FDTYP", "FD2323*{!}*{!}*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"}
答案2
得分: 1
在Oracle中,您可以使用正则表达式'^(^/]+/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$'
并提取捕获组:
SELECT REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
1
) AS col1,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
2
) AS col2,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
3
) AS col3,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
4
) AS col4,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
5
) AS col5,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
6
) AS col6,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
7
) AS col7,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
8
) AS col8
FROM table_name
对于样本数据:
CREATE TABLE table_name (value) AS
SELECT 'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL UNION ALL
SELECT 'FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL
输出如下:
COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 |
---|---|---|---|---|---|---|---|
FDTYP | FD2323*/*/*- | FDPERIOD | 31 Day(s) | MAT_INST | CREDIT_TO_ACC | DEPTACCT | 0016 |
FDTYP | TESTTYPE | FDPERIOD | 31 Day(s) | MAT_INST | CREDIT_TO_ACC | DEPTACCT | 0016 |
或者,要以单个格式化字符串输出:
SELECT REGEXP_REPLACE(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
'{"", "", "", "", "", "", "", ""}'
) AS data
FROM table_name;
输出如下:
DATA |
---|
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"} |
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"} |
英文:
In Oracle, you can use the regular expression '^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$'
and extract the capturing groups:
SELECT REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
1
) AS col1,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
2
) AS col2,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
3
) AS col3,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
4
) AS col4,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
5
) AS col5,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
6
) AS col6,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
7
) AS col7,
REGEXP_SUBSTR(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
1,
1,
NULL,
8
) AS col8
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'FDTYP/FD2323*/*/*-/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL UNION ALL
SELECT 'FDTYP/TESTTYPE/FDPERIOD/31 Day(s)/MAT_INST/CREDIT_TO_ACC/DEPTACCT/0016/' FROM DUAL
Outputs:
COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 |
---|---|---|---|---|---|---|---|
FDTYP | FD2323*/*/*- | FDPERIOD | 31 Day(s) | MAT_INST | CREDIT_TO_ACC | DEPTACCT | 0016 |
FDTYP | TESTTYPE | FDPERIOD | 31 Day(s) | MAT_INST | CREDIT_TO_ACC | DEPTACCT | 0016 |
Or, to get the output as a single formatted string:
SELECT REGEXP_REPLACE(
value,
'^([^/]+)/(.*)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$',
'{"", "", "", "", "", "", "", ""}'
) AS data
FROM table_name;
Which outputs:
DATA |
---|
{"FDTYP", "FD2323*/*/*-", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"} |
{"FDTYP", "TESTTYPE", "FDPERIOD", "31 Day(s)", "MAT_INST", "CREDIT_TO_ACC", "DEPTACCT", "0016"} |
答案3
得分: 0
您可以使用以下。
```java
(?<!\*)/
输出
FDTYP
TESTTYPE
FDPERIOD
31 天
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016
FDTYP
FD2323*/*/*-
FDPERIOD
31 天
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016
<details>
<summary>英文:</summary>
You can use the following.
```java
(?<!\*)/
Output
FDTYP
TESTTYPE
FDPERIOD
31 Day(s)
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016
FDTYP
FD2323*/*/*-
FDPERIOD
31 Day(s)
MAT_INST
CREDIT_TO_ACC
DEPTACCT
0016
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论