解码 Java 中的 koi8-r 字符串和带引号可打印字符串。

huangapple go评论70阅读模式
英文:

Decode koi8-r string and quoted-printable in java

问题

我有一个.eml文件,里面有一些附件

其中一个附件是.rar文件

我使用Tika来提取这个rar文件,但有时Tika不能正确转换一些文件的名称,例如这样的名称

=?koi8-r?Q?6=5F=F4=ED=5F15=2E05=2Erar?=

所以我在寻找答案,如何将这样的字符串转换为正确的可读值

在Java中是否有任何库可以做到这一点?

我猜这可能是因为字符串开头有 =?koi8-r?Q?,所以也许如果我将字符串转换为类似这样的形式,我会得到更可转换的值,像这样 6=5F=F4=ED=5F15=2E05=2E,但是如果我这样做了,我最终找不到一个解决方法来转换

有人知道如何正确转换这样的字符串吗?

我花了很多时间来尝试,但是仍然没有结果...

英文:

I got a .eml file, and some attachments inside

one of attachments - is .rar file

I using Tika to extract this rar, but sometimes Tika cant correctly convert some names of files, for example - such a name

=?koi8-r?Q?6=5F=F4=ED=5F15=2E05=2Erar?=

So i was looking for an answer, how to convert such a string to correctly readed value

Is there any libraries in java, to do this?

I guess it happends cause string got =?koi8-r?Q? in the start, so maybe, if i convert string to something like this, i will get move convertable value, like this 6=5F=F4=ED=5F15=2E05=2E, but if i will do so, i finnaly couldnt find a solution to convert

Does anybody know how to convert such a string correctly?

I spend a lot of time to make it, but still - no results...

答案1

得分: 1

以下是翻译好的代码部分:

public class EncodingUtils {
    private EncodingUtils() {
    }

    public static String decodeKoi8r(String text) {
        String decode;
        try {
            decode = MimeUtility.decodeText(text);
        } catch (UnsupportedEncodingException e) {
            decode = text;
        }

        if (isQuotedKoi8r(decode)) {
            decode = decode(text, "KOI8-R", "quoted-printable", "KOI8-R");
        }
        return decode;
    }

    public static boolean isQuotedKoi8r(String text) {
        return text.contains("=") || text.toLowerCase().contains("koi8-r");
    }

    public static String decode(String text, String textEncoding, String encoding, String resultCharset) {
        if (text.length() == 0) {
            return text;
        }

        try {
            byte[] bytes = text.getBytes(textEncoding);
            InputStream decodedStream = MimeUtility.decode(new ByteArrayInputStream(bytes), encoding);
            byte[] tmp = new byte[bytes.length];
            int n = decodedStream.read(tmp);
            byte[] res = new byte[n];
            System.arraycopy(tmp, 0, res, 0, n);
            return new String(res, resultCharset);
        } catch (IOException | MessagingException e) {
            return text;
        }
    }
}

测试部分:

public class EncodingUtilsTest {
    @Test
    public void koi8r() {
        String input = "=?koi8-r?Q?11=5F=F4=ED=5F21=2E05=2Erar?=";
        String decode = EncodingUtils.decodeKoi8r(input);
        Assertions.assertEquals("11_ТМ_21.05.rar", decode);
    }

    @Test
    public void koi8rWithoutStartTag() {
        String input = "=CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=.eml";
        String decode = EncodingUtils.decodeKoi8r(input);
        //Тут знак "=" вконце, это битые исходные данные, если заменить исходную строку на =CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=C5.eml будет нормальное слово "резюме"
        Assertions.assertEquals("отдельным траншем резюм=", decode);
    }
}

祝您一天愉快!

英文:

Here is a code

public class EncodingUtils {
private EncodingUtils() {
}
public static String decodeKoi8r(String text) {
String decode;
try {
decode = MimeUtility.decodeText(text);
} catch (UnsupportedEncodingException e) {
decode = text;
}
if (isQuotedKoi8r(decode)) {
decode = decode(text, "KOI8-R", "quoted-printable", "KOI8-R");
}
return decode;
}
public static boolean isQuotedKoi8r(String text) {
return text.contains("=") || text.toLowerCase().contains("koi8-r");
}
public static String decode(String text, String textEncoding, String encoding, String resultCharset) {
if (text.length() == 0) {
return text;
}
try {
byte[] bytes = text.getBytes(textEncoding);
InputStream decodedStream = MimeUtility.decode(new ByteArrayInputStream(bytes), encoding);
byte[] tmp = new byte[bytes.length];
int n = decodedStream.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return new String(res, resultCharset);
} catch (IOException | MessagingException e) {
return text;
}
}
}

And test:

public class EncodingUtilsTest {
@Test
public void koi8r() {
String input = "=?koi8-r?Q?11=5F=F4=ED=5F21=2E05=2Erar?=";
String decode = EncodingUtils.decodeKoi8r(input);
Assertions.assertEquals("11_ТМ_21.05.rar", decode);
}
@Test
public void koi8rWithoutStartTag() {
String input = "=CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=.eml";
String decode = EncodingUtils.decodeKoi8r(input);
//Тут знак "=" вконце, это битые исходныые данные, если заменить исходную строку на =CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=C5.eml будет нормальное слово "резюме"
Assertions.assertEquals("отдельным траншем резюм=.eml", decode);
}
}

Good day!

huangapple
  • 本文由 发表于 2020年5月5日 01:37:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/61598288.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定