删除一行单行,如果找到特定字符串。

huangapple go评论74阅读模式
英文:

Delete a single line if a specific string is found

问题

我对IIB还相对新手,目前我想实现的目标是,如果一个TXT文件中包含特定单词,比如USA,则从中删除单行。我将其读取为BLOB,然后转换为字符串。我应该使用Compute节点还是Java节点来实现这一点?提前感谢。

例如,

在之前:

你好,我的名字是Malcom,我住在美国。

之后:

你好,我的名字是Malcom,我住在。

当前流程
FileInput -> Compute -> JavaCompute -> FileOutput

FileInput:从特定文件夹读取数据

Compute:将一个字符串替换为另一个字符串(掩码)

CREATE PROCEDURE getBLOBMessage() BEGIN
    DECLARE fullBLOB CHARACTER;
    SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
    SET OutputLocalEnvironment.msg = fullBLOB;
END;

CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
    SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
    SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
    SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
    SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
    SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
    SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
    SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
    SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
    SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
    SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
    SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX');
    SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
END;

JavaCompute:也许用于删除行?

FileOutput:生成输出txt文件

英文:

I'm fairly new with IIB, currently what I want to achieve is to delete a single line from a TXT if it contains a specific word for example the word USA as per below.I read it as a BLOB then convert it to a string. Should I achieve this using Compute node or Java node? Thanks in advance.

e.g

Before

Hello my name 
is Malcom and I live
in the USA

After

Hello my name 
is Malcom and I live

Current Flow
FileInput -> Compute -> JavaCompute -> FileOutput

FileInput : To read data from a specific folder

Compute : Replace a string to another string (mask)

CREATE PROCEDURE getBLOBMessage() BEGIN
		DECLARE fullBLOB CHARACTER;
		SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
		SET OutputLocalEnvironment.msg = fullBLOB;
	END;
	
	CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
         SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
         SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
         SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
         SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX'); 
         SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
	END;

JavaCompute: For removing line maybe?

FileOutput: To generate the output txt file

答案1

得分: 1

如果您使用文件输入节点的记录检测功能,您的需求可以在ESQL中得到满足。

文件输入节点:

  • 记录和元素:记录检测 = 分隔符
  • 将“数据结束”连接到文件输出节点的“完成文件”

计算节点:

CREATE COMPUTE MODULE Thaqif_Compute

	CREATE FUNCTION Main() RETURNS BOOLEAN
	BEGIN
		SET OutputRoot = InputRoot;
		DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
									CCSID InputProperties.CodedCharSetId
									ENCODING InputProperties.Encoding);
		IF CONTAINS(line, 'USA') THEN
			RETURN FALSE;
		ELSE
			CALL maskMessage(line);
			SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB 
											CCSID InputProperties.CodedCharSetId
											ENCODING InputProperties.Encoding);
			RETURN TRUE;
		END IF;
	END;

	CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
		SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
		-- 其他模式已删除以保持简洁性
		SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
	END;

END MODULE;

文件输出节点:

  • 记录和元素:记录定义 = 记录为分隔数据

示例输入:

Hello my name 
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok

生成的输出:

Hello my name 
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok
英文:

If you use the Record detection feature of the File Input node, your requirements can be fulfilled in ESQL.

FileInput node:

  • Records and Elements: Record detection = Delimited
  • Connect End of Data to Finish File of FileOutput node

Compute node:

CREATE COMPUTE MODULE Thaqif_Compute

	CREATE FUNCTION Main() RETURNS BOOLEAN
	BEGIN
		SET OutputRoot = InputRoot;
		DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
									CCSID InputProperties.CodedCharSetId
									ENCODING InputProperties.Encoding);
		IF CONTAINS(line, 'USA') THEN
			RETURN FALSE;
		ELSE
			CALL maskMessage(line);
			SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB 
											CCSID InputProperties.CodedCharSetId
											ENCODING InputProperties.Encoding);
			RETURN TRUE;
		END IF;
	END;

	CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
		SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
		-- Other patterns removed for brevity
		SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
	END;

END MODULE;

FileOutput node:

  • Records and Elements: Record definition = Record is delimited data

Example input:

Hello my name 
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok

Resulting output:

Hello my name 
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok

huangapple
  • 本文由 发表于 2020年10月7日 10:23:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/64236316.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定