英文:
Decode koi8-r string and quoted-printable in java
问题
我有一个.eml文件,里面有一些附件
其中一个附件是.rar文件
我使用Tika来提取这个rar文件,但有时Tika不能正确转换一些文件的名称,例如这样的名称
=?koi8-r?Q?6=5F=F4=ED=5F15=2E05=2Erar?=
所以我在寻找答案,如何将这样的字符串转换为正确的可读值
在Java中是否有任何库可以做到这一点?
我猜这可能是因为字符串开头有 =?koi8-r?Q?
,所以也许如果我将字符串转换为类似这样的形式,我会得到更可转换的值,像这样 6=5F=F4=ED=5F15=2E05=2E
,但是如果我这样做了,我最终找不到一个解决方法来转换
有人知道如何正确转换这样的字符串吗?
我花了很多时间来尝试,但是仍然没有结果...
英文:
I got a .eml file, and some attachments inside
one of attachments - is .rar file
I using Tika to extract this rar, but sometimes Tika cant correctly convert some names of files, for example - such a name
=?koi8-r?Q?6=5F=F4=ED=5F15=2E05=2Erar?=
So i was looking for an answer, how to convert such a string to correctly readed value
Is there any libraries in java, to do this?
I guess it happends cause string got =?koi8-r?Q?
in the start, so maybe, if i convert string to something like this, i will get move convertable value, like this 6=5F=F4=ED=5F15=2E05=2E
, but if i will do so, i finnaly couldnt find a solution to convert
Does anybody know how to convert such a string correctly?
I spend a lot of time to make it, but still - no results...
答案1
得分: 1
以下是翻译好的代码部分:
public class EncodingUtils {
private EncodingUtils() {
}
public static String decodeKoi8r(String text) {
String decode;
try {
decode = MimeUtility.decodeText(text);
} catch (UnsupportedEncodingException e) {
decode = text;
}
if (isQuotedKoi8r(decode)) {
decode = decode(text, "KOI8-R", "quoted-printable", "KOI8-R");
}
return decode;
}
public static boolean isQuotedKoi8r(String text) {
return text.contains("=") || text.toLowerCase().contains("koi8-r");
}
public static String decode(String text, String textEncoding, String encoding, String resultCharset) {
if (text.length() == 0) {
return text;
}
try {
byte[] bytes = text.getBytes(textEncoding);
InputStream decodedStream = MimeUtility.decode(new ByteArrayInputStream(bytes), encoding);
byte[] tmp = new byte[bytes.length];
int n = decodedStream.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return new String(res, resultCharset);
} catch (IOException | MessagingException e) {
return text;
}
}
}
测试部分:
public class EncodingUtilsTest {
@Test
public void koi8r() {
String input = "=?koi8-r?Q?11=5F=F4=ED=5F21=2E05=2Erar?=";
String decode = EncodingUtils.decodeKoi8r(input);
Assertions.assertEquals("11_ТМ_21.05.rar", decode);
}
@Test
public void koi8rWithoutStartTag() {
String input = "=CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=.eml";
String decode = EncodingUtils.decodeKoi8r(input);
//Тут знак "=" вконце, это битые исходные данные, если заменить исходную строку на =CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=C5.eml будет нормальное слово "резюме"
Assertions.assertEquals("отдельным траншем резюм=", decode);
}
}
祝您一天愉快!
英文:
Here is a code
public class EncodingUtils {
private EncodingUtils() {
}
public static String decodeKoi8r(String text) {
String decode;
try {
decode = MimeUtility.decodeText(text);
} catch (UnsupportedEncodingException e) {
decode = text;
}
if (isQuotedKoi8r(decode)) {
decode = decode(text, "KOI8-R", "quoted-printable", "KOI8-R");
}
return decode;
}
public static boolean isQuotedKoi8r(String text) {
return text.contains("=") || text.toLowerCase().contains("koi8-r");
}
public static String decode(String text, String textEncoding, String encoding, String resultCharset) {
if (text.length() == 0) {
return text;
}
try {
byte[] bytes = text.getBytes(textEncoding);
InputStream decodedStream = MimeUtility.decode(new ByteArrayInputStream(bytes), encoding);
byte[] tmp = new byte[bytes.length];
int n = decodedStream.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return new String(res, resultCharset);
} catch (IOException | MessagingException e) {
return text;
}
}
}
And test:
public class EncodingUtilsTest {
@Test
public void koi8r() {
String input = "=?koi8-r?Q?11=5F=F4=ED=5F21=2E05=2Erar?=";
String decode = EncodingUtils.decodeKoi8r(input);
Assertions.assertEquals("11_ТМ_21.05.rar", decode);
}
@Test
public void koi8rWithoutStartTag() {
String input = "=CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=.eml";
String decode = EncodingUtils.decodeKoi8r(input);
//Тут знак "=" вконце, это битые исходныые данные, если заменить исходную строку на =CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=C5.eml будет нормальное слово "резюме"
Assertions.assertEquals("отдельным траншем резюм=.eml", decode);
}
}
Good day!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论