特殊字符问题:MQ消息PUT错误:java.nio.charset.UnmappableCharacterException

huangapple go评论71阅读模式
英文:

Special Character Issue : MQ message PUT error : java.nio.charset.UnmappableCharacterException

问题

我有一个设置,其中有一个 JMS 生产者和 JMS 接收者。发送方应用程序发送了如下消息:

    源文本 ⟨е, ё, и, ю, я⟩  abcdefg

JMS 接收者在接收到消息后,使用纯 IBM MQ API 类将其放入 IBM MQ 队列中。

在将此消息放入 MQ 时,我遇到了以下异常:

    INFO  | 2020-09-17 09:45:19 | [main] mimq.MQReceiver (MQReceiver.java:211) - IO Exception Occurred: Input length = 1
    java.nio.charset.UnmappableCharacterException: Input length = 1
            at java.nio.charset.CoderResult.throwException(CoderResult.java:282)
            at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:816)
            at com.ibm.mq.jmqi.system.JmqiCodepage.stringToBytes(JmqiCodepage.java:923)
            at com.ibm.mq.MQMessage.writeString(MQMessage.java:2848)
            at com.ibm.mimq.MQReceiver.sendToAnotherQueue(MQReceiver.java:192)
            at com.ibm.mimq.MQReceiver.main(MQReceiver.java:113)

以下是我的 MQ PUT 代码:

    public static void sendToLocalQueue(String msg) {
		
        // ... 一些设置
        
        try {
            // ... 初始化设置
            
            MQMessage m = new MQMessage();
            // ... 设置消息属性
            
            m.characterSet  = MQC.MQCCSI_Q_MGR; // 这行导致问题
            
            // ... 创建消息并设置属性
            
            // ... 发送消息
            
        } catch (MQException me) {
            // ... 处理异常
        } catch (IOException e) {
            // ... 处理异常
        }
    }

由于字符无法映射,它无法将消息放入队列中。队列管理器级别的编码设置为 UTF-8。

然而,当我将以下行:

    m.characterSet  = MQC.MQCCSI_Q_MGR;

替换为以下行:

    m.characterSet = 1208;

问题就不再存在。

我的问题是为什么在 MQ 级别上没有进行此转换。我需要检查哪些设置以确保正确的转换。我尝试了以下技术,但未成功:

  • 将 Java 参数设置为:-Dfile.encoding=UTF-8

我的 环境

  • 服务器:Linux
  • MQ:9.0 或 7.5
  • Java:1.8

还有一件事要提及的是,同样的设置在 7.5 下工作正常,但在迁移到 MQ 9.0 后不再工作。我知道通过修改上述的一行代码可以解决问题。但我想了解在 MQ 级别上,是否有一些配置我遗漏了。非常感谢您的任何建议。

更新

我发送消息到 MQ 的客户端机器的 CCSID 为:MQMD.CodedCharSetId = 1208

我连接并发送消息到的 MQ 服务器具有以下属性:

    getDefaultProperty(Object) returns [819(0x333)] Integer
    setCCSID(int) setter [819(0x333)]

因此,当我在代码中明确设置为 1208 时,它可以正常工作。如果不设置,转换将失败。

更新-2

MQC.MQCCSI_Q_MGR 是零,我在 JAR 中看到了。因此,代码是这样设计的,如果值为零,它将从 JAR 中获取默认值,该值设置为 819。当我打开 MQ 跟踪时,我学到了这一点。代码如下所示:

    getDefaultProperty(Object) returns [819(0x333)] Integer
    setCCSID(int) setter [819(0x333)]

这段代码在 JAR 中存在。因此,我们需要在消息上显式设置字符集值。在我的情况下,它是 1208。

英文:

I have a setup where there is a JMS Producer and JMS receiver. The sender application sends a message like :

source text ⟨е, ё, и, ю, я⟩  abcdefg

JMS receiver after receiving the message, puts it to a IBM MQ queue using pure IBM MQ API classes.

While putting this message to MQ I am getting the below exception:

INFO  | 2020-09-17 09:45:19 | [main] mimq.MQReceiver (MQReceiver.java:211) - IO Exception Occurred: Input length = 1
java.nio.charset.UnmappableCharacterException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:282)
        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:816)
        at com.ibm.mq.jmqi.system.JmqiCodepage.stringToBytes(JmqiCodepage.java:923)
        at com.ibm.mq.MQMessage.writeString(MQMessage.java:2848)
        at com.ibm.mimq.MQReceiver.sendToAnotherQueue(MQReceiver.java:192)
        at com.ibm.mimq.MQReceiver.main(MQReceiver.java:113)

Below is my MQ PUT code :

public static void sendToLocalQueue(String msg) {
	
	int port = 1414;
	String host = "some-host";
	String channel = "some-channel";
	String manager = "some-QM";
	String user = "user"; 
	String passwd = "passwd";
	String qname = "TEST";
	String qmname = "some-QM";
	
	MQQueueManager qMgr;
	MQQueue inputQ;
	
	try {		
		
		Hashtable<String, String> h = new Hashtable<String, String>();
		h.put(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_CLIENT);
		MQEnvironment.properties = h;
		MQEnvironment.hostname	= host;
		MQEnvironment.port 		= port;
		MQEnvironment.channel	= channel;
		
		MQEnvironment.userID	= user;
		MQEnvironment.password	= passwd;
		MQEnvironment.disableTracing();
		MQException.log = null;
		qMgr = new MQQueueManager(manager);
		
		MQMessage m = new MQMessage();		
		m.applicationOriginData = "AMPS";
		m.messageType 	= MQC.MQMT_DATAGRAM;
		m.format 		= MQC.MQFMT_STRING;
		m.encoding 		= MQC.MQENC_NATIVE;
		m.priority 		= 4;
		m.persistence 	= MQC.MQPER_PERSISTENT;
		m.characterSet 	= MQC.MQCCSI_Q_MGR;
		//m.characterSet = 1208;
		m.expiry 		=  MQC.MQEI_UNLIMITED;
		m.writeString(msg);
		
		MQPutMessageOptions putOptions = new MQPutMessageOptions();
		putOptions.options = MQC.MQPMO_SYNCPOINT | MQC.MQPMO_FAIL_IF_QUIESCING;	
		
		logger.info("Putting message to LAN MQ (TEST queue)....");			
		qMgr.put(qname, qmname, m, putOptions);
		qMgr.commit();
		
		
	} catch(MQException me) {
		logger.info("Error Code       : "  +me.getErrorCode());
		logger.info("LocalizedMessage : "  +me.getLocalizedMessage());
		logger.info("Message          : "  +me.getMessage());
		logger.info("Reason           : "  +me.getReason());
		me.printStackTrace();
		
	} catch (IOException e) {
		// TODO Auto-generated catch block
		logger.info("IO Exception Occurred       : "  +e.getLocalizedMessage());
		e.printStackTrace();
	}		
}

It is unable to put the message to the queue due to Unmappable character. The encoding is set to UTF-8 at Queue Manager level.

However when I replace the below line : m.characterSet = MQC.MQCCSI_Q_MGR;

With the line : m.characterSet = 1208; The issue is no more there.

My question is why this conversion is not getting done at MQ level. What are settings I need to check to ensure the correct conversion. I have tried the below techniques, but not working :

Setting java parameter as : -Dfile.encoding=UTF-8 

My environment :

Server : Linux
MQ : 9.0  or 7.5
Java : 1.8

One more thing to mention, the same setup was working with 7.5 but not working with MQ 9.0 after migration. I know with my above one line code change I can pass the message. But I want to understand in MQ level, if I am missing out some configurations. Any advise would be much appreciated.

Thank You.

UPDATE

The Client Machine from where I am sending the Message over to MQ has CCSID : MQMD.CodedCharSetId = 1208

The MQ server where I am connecting and sending the message has this :

getDefaultProperty(Object) returns [819(0x333)] Integer
setCCSID(int) setter [819(0x333)]

So when I am setting 1208 explicitely in my code it is working. When not the conversion is failing.

UPDATE-2

The value MQC.MQCCSI_Q_MGR is Zero as I saw in the jar. Hence the code is designed like this, if the value is Zero, it will fetch the default value from the Jar which is set to 819. I learnt this when I turned on the MQ tracing. And the code is like this :

getDefaultProperty(Object) returns [819(0x333)] Integer
setCCSID(int) setter [819(0x333)]

This code is present inside the jar. So we need to explicitly set the charset value on the message. In my case it is 1208.

答案1

得分: 3

连接到本地队列管理器时(即使用 TRANSPORT_MQSERIES_BINDINGS ),则代码:-

m.characterSet     = MQC.MQCCSI_Q_MGR;

的意思是“使用队列管理器属性 CCSID 中设置的 CCSID”。您可以使用以下 MQSC 命令查看此属性:-

DISPLAY QMGR CCSID

当作为客户端连接时(如您的代码所示),则代码:-

m.characterSet     = MQC.MQCCSI_Q_MGR;

的意思是“从客户端机器区域设置中查找 CCSID”。

IBM 知识中心 指出:-

> 对于客户端应用程序,“MQCCSI_Q_MGR” 是基于客户端的区域设置填充的,而不是队列管理器上的区域设置。

似乎您的客户端机器区域设置未设置为 UTF-8,如果将代码行更改为显式设置消息 CCSID 为 1208,则可以解决该问题。

您可以通过浏览队列上的消息(而不进行转换)并查看“MQMD.CodedCharSetId”字段中的内容来查看客户端设置了什么。

英文:

When connected locally to the queue manager (i.e. using TRANSPORT_MQSERIES_BINDINGS) then the code:-

m.characterSet     = MQC.MQCCSI_Q_MGR;

means "take the CCSID set in the queue manager property CCSID". You can see this property using the following MQSC command:-

DISPLAY QMGR CCSID

When connected as a client (as your code shows you are), then the code:-

m.characterSet     = MQC.MQCCSI_Q_MGR;

means "find the CCSID from the client machine locale".

IBM Knowledge Center states:-

> For client applications, MQCCSI_Q_MGR is filled in, based on the locale of the client rather than the one on the queue manager.

It would seem that your client machine locale is not set to UTF-8 if changing the line of code to explicitly set the message CCSID to 1208 fixes the problem.

You can see what is set by the client, by browsing the message on the queue (without converting it) and looking what is in the MQMD.CodedCharSetId field.

答案2

得分: 3

这在 MQ v7.5 上运行而在 MQ v9.0 上不运行的原因是因为在 IBM MQ v8.0 之前,IBM MQ 的 Java 类使用 java.nio.charset.Charset.encode(CharBuffer) 对数据进行编码,这会导致对于格式错误或无法翻译的数据进行默认替换。由于默认的 characterSet 是 819(ASCII),这意味着任何无法转换为 ASCII 的字符都会被默认替换为替换字符,大多数情况下会被替换为 ? 字符。

在 v8.0 之后,默认行为更改为将此情况报告为错误,并且不再默认替换格式错误或无法翻译的数据。


将字符集设置为 UTF-8 是最佳解决方案,因为这会发送您打算发送的确切数据。

另一个选项是告诉 MQ 使用先前的行为。


有关新行为的描述以及如何配置 IBM MQ 的 Java 类以使用先前的行为的信息在 IBM MQ 9.0 知识中心页面 开发应用程序>开发 JMS 和 Java 应用程序>使用 IBM MQ Java 类>IBM MQ Java 类中的字符串转换 中有说明:

> 从 IBM® MQ 版本 8.0 开始,有关 IBM MQ Java 类中字符串转换的某些默认行为已更改。
>
> 在 IBM MQ 版本 8.0 之前,IBM MQ Java 类中的字符串转换是通过调用 java.nio.charset.Charset.decode(ByteBuffer) 和 Charset.encode(CharBuffer) 方法来执行的。
>
> 使用这些方法中的任何一个都会导致格式错误或无法翻译数据的默认替换(REPLACE)。
>
> 这种行为可能会掩盖应用程序中的错误,并导致意外字符(例如转换数据中的 ?)。从 IBM MQ 版本 8.0 开始,为了更早更有效地检测此类问题,IBM MQ Java 类直接使用 CharsetEncoders 和 CharsetDecoders,并明确配置处理格式错误和无法翻译数据。
>
> 从 IBM MQ 版本 8.0 开始,默认行为是通过抛出适当的 MQException 来报告此类问题。
>
> ...
>
> 设置系统默认值
> --
>
> 从 IBM MQ 版本 8.0 开始,以下两个 Java 系统属性可用于配置有关字符串转换的默认行为。
>
> com.ibm.mq.cfg.jmqi.UnmappableCharacterAction 指定在编码和解码过程中对于无法翻译的数据要执行的操作。值可以是 REPORT、REPLACE 或 IGNORE。
>
> com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement 设置或获取在字符无法映射于编码操作中要应用的替换字节。默认的 Java 替换字符串在解码操作中使用。
>
> 为避免在 Java 字符和本机字节表示之间混淆,您应该将 com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement 指定为表示本机字符集中替换字节的十进制数。
>
> 例如,如果本机字符集是基于 ASCII 的(如 ISO-8859-1),则 ? 的本机字节的十进制值为 63,而如果本机字符集是 EBCDIC,则为 111。


如果您想模仿先前的行为,您可以设置以下系统属性:

-Dcom.ibm.mq.cfg.jmqi.UnmappableCharacterAction=REPLACE
-Dcom.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement=63

您还可以以编程方式进行设置,例如:

System.setProperty("com.ibm.mq.cfg.jmqi.UnmappableCharacterAction", "REPLACE");
System.setProperty("com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement", "63");
英文:

The reason why this worked at MQ v7.5 and not at MQ v9.0 is because prior to IBM MQ v8.0 the IBM MQ classes for Java encoded data using java.nio.charset.Charset.encode(CharBuffer) which results in a default replacement of malformed or untranslatable data. Since the default characterSet is 819 (ASCII), this would result in any character you send that can not be converted to ASCII to be transparently replaced with the default replacement character, in most cases this means the data was replaced with the ? character.

After v8.0 the default behavior changed to report this situation as an error and no longer replace malformed or untranslatable data by default.


Your solution to set the character set to UTF-8 is the best solution as this results in the exact data you intended to send being sent.

Another option is to tell MQ to use the prior behavior.


The description of the new behavior as well as how to configure IBM MQ classes for Java to use the prior behavior is described in the IBM MQ 9.0 Knowledge Center page Developing applications>Developing JMS and Java applications>Using IBM MQ classes for Java>Character string conversions in IBM MQ classes for Java:

> From IBM® MQ Version 8.0, some of the default behavior regarding
> character string conversion in the IBM MQ classes for Java™ has
> changed.
>
> Before IBM MQ Version 8.0, string conversions in IBM MQ classes for
> Java was done by calling the
> java.nio.charset.Charset.decode(ByteBuffer) and
> Charset.encode(CharBuffer) methods.
>
> Using either of these methods results in a default replacement (
> REPLACE) of malformed or untranslatable data.
>
> This behavior can obscure errors in applications, and lead to
> unexpected characters, for example ?, in translated data. From IBM MQ
> Version 8.0, to detect such issues earlier and more effectively, the
> IBM MQ classes for Java use CharsetEncoders and CharsetDecoders
> directly and configure the handling of malformed and untranslatable
> data explicitly.
>
> From IBM MQ Version 8.0, the default behavior is to REPORT such issues
> by throwing a suitable MQException.
>
> ...
>
> Setting system defaults
> --
>
> From IBM MQ Version 8.0, the following two Java system properties are
> available to configure default behavior regarding character string
> conversion.
>
> com.ibm.mq.cfg.jmqi.UnmappableCharacterAction Specifies the action to be taken for untranslatable data on encoding and decoding.
> The value can be REPORT, REPLACE, or IGNORE.
>
> com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement Sets or gets the replacement bytes to apply when a character cannot be mapped
> in an encoding operation The default Java replacement string is used
> in decoding operations.
>
> To avoid confusion between Java character and native byte
> representations, you should specify
> com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement as a decimal number
> representing the replacement byte in the native character set.
>
> For example, the decimal value of ?, as a native byte, is 63 if the
> native character set is ASCII-based, such as ISO-8859-1, while it is
> 111 if the native character set is EBCDIC.


If you wanted to mimic the prior behavior you would set the following system properties:

-Dcom.ibm.mq.cfg.jmqi.UnmappableCharacterAction=REPLACE
-Dcom.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement=63

You can also set it Programmatically with something like this:

System.setProperty("com.ibm.mq.cfg.jmqi.UnmappableCharacterAction", "REPLACE");
System.setProperty("com.ibm.mq.cfg.jmqi.UnmappableCharacterReplacement", "63");

huangapple
  • 本文由 发表于 2020年9月17日 18:07:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/63935780.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定