英文:
Delete a single line if a specific string is found
问题
我对IIB还相对新手,目前我想实现的目标是,如果一个TXT文件中包含特定单词,比如USA,则从中删除单行。我将其读取为BLOB,然后转换为字符串。我应该使用Compute节点还是Java节点来实现这一点?提前感谢。
例如,
在之前:
你好,我的名字是Malcom,我住在美国。
之后:
你好,我的名字是Malcom,我住在。
当前流程
FileInput -> Compute -> JavaCompute -> FileOutput
FileInput:从特定文件夹读取数据
Compute:将一个字符串替换为另一个字符串(掩码)
CREATE PROCEDURE getBLOBMessage() BEGIN
DECLARE fullBLOB CHARACTER;
SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
SET OutputLocalEnvironment.msg = fullBLOB;
END;
CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX');
SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
END;
JavaCompute:也许用于删除行?
FileOutput:生成输出txt文件
英文:
I'm fairly new with IIB, currently what I want to achieve is to delete a single line from a TXT if it contains a specific word for example the word USA as per below.I read it as a BLOB then convert it to a string. Should I achieve this using Compute node or Java node? Thanks in advance.
e.g
Before
Hello my name
is Malcom and I live
in the USA
After
Hello my name
is Malcom and I live
Current Flow
FileInput -> Compute -> JavaCompute -> FileOutput
FileInput : To read data from a specific folder
Compute : Replace a string to another string (mask)
CREATE PROCEDURE getBLOBMessage() BEGIN
DECLARE fullBLOB CHARACTER;
SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
SET OutputLocalEnvironment.msg = fullBLOB;
END;
CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX');
SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
END;
JavaCompute: For removing line maybe?
FileOutput: To generate the output txt file
答案1
得分: 1
如果您使用文件输入节点的记录检测功能,您的需求可以在ESQL中得到满足。
文件输入节点:
- 记录和元素:记录检测 = 分隔符
- 将“数据结束”连接到文件输出节点的“完成文件”
计算节点:
CREATE COMPUTE MODULE Thaqif_Compute
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
SET OutputRoot = InputRoot;
DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
CCSID InputProperties.CodedCharSetId
ENCODING InputProperties.Encoding);
IF CONTAINS(line, 'USA') THEN
RETURN FALSE;
ELSE
CALL maskMessage(line);
SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB
CCSID InputProperties.CodedCharSetId
ENCODING InputProperties.Encoding);
RETURN TRUE;
END IF;
END;
CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
-- 其他模式已删除以保持简洁性
SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
END;
END MODULE;
文件输出节点:
- 记录和元素:记录定义 = 记录为分隔数据
示例输入:
Hello my name
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok
生成的输出:
Hello my name
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok
英文:
If you use the Record detection feature of the File Input node, your requirements can be fulfilled in ESQL.
FileInput node:
- Records and Elements: Record detection = Delimited
- Connect
End of Data
toFinish File
of FileOutput node
Compute node:
CREATE COMPUTE MODULE Thaqif_Compute
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
SET OutputRoot = InputRoot;
DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
CCSID InputProperties.CodedCharSetId
ENCODING InputProperties.Encoding);
IF CONTAINS(line, 'USA') THEN
RETURN FALSE;
ELSE
CALL maskMessage(line);
SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB
CCSID InputProperties.CodedCharSetId
ENCODING InputProperties.Encoding);
RETURN TRUE;
END IF;
END;
CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
-- Other patterns removed for brevity
SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
END;
END MODULE;
FileOutput node:
- Records and Elements: Record definition = Record is delimited data
Example input:
Hello my name
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok
Resulting output:
Hello my name
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论