将Unicode文本以’wb’模式使用C和Objective C写入文件?

huangapple go评论65阅读模式
英文:

Write Unicode text to a file in 'wb' mode using C and Objective C?

问题

问题可能出在字符编码的处理上。首先,请确保你的代码中使用了正确的字符编码。你可以尝试将字符编码设置为UTF-8,因为你的目标是将字符以16进制或二进制格式写入文件。

以下是你可以尝试的修改:

  1. 在打开文件之前,确保文件以二进制写入模式打开,以确保不会在写入过程中进行字符编码转换。
FILE *fd = fopen("output.txt", "wb"); // 使用"wb"打开文件
if (fd == NULL) {
    printf("Failed to open file for writing.\n");
    return 1;
}
  1. 在处理字符时,使用无符号字符 (unsigned char),以确保字符按其原始字节写入文件。
const unsigned char *stringText = (const unsigned char *)[fileName UTF8String];
  1. 在写入字符时,将每个字符的字节写入文件,而不是多字节字符的字符。
for (int i = 0; i < len; i++) {
    fprintf(fd, "%02X ", stringText[i]); // 以16进制格式写入每个字节
}

这样,你应该能够正确地将字符以16进制格式写入文件,而无需担心多字节字符的问题。

请尝试这些修改,并检查结果是否符合你的预期。希望这能帮助你解决问题。

英文:

I have this unicode text which contains unicode characters

  NSString *fileName = @&quot;Tên tình bạn dưới tình yêu.mp3&quot;;
  const char *cStringFile = [fileName UTF8String];

Now I need to save this string in hex/binary format to a file in this format

 T  ê  n     t  ì  n  h     b    ạ   n
 54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

As you can see the character 'ê' is written as EA, but 'ạ' is written as '1E A1' which is correct as per the Vietnamese character set
(https://vietunicode.sourceforge.net/charset/)

To achieve this, this is the code, I used to write multibyte characters to the file

// Determine the required size for the wchar_t string
size_t input_length = strlen(cStringFile);
size_t output_length = mbstowcs(NULL, stringText, input_length);

// Allocate memory for the wchar_t string
wchar_t *output = (wchar_t *)malloc((output_length + 1) * sizeof(wchar_t));
if (output == NULL) {
    printf(&quot;Memory allocation failed.\n&quot;);
    return 1;
}

// Convert the C string to wchar_t string
mbstowcs(output, cStringFile, input_length);
output[output_length] = L&#39;
// Determine the required size for the wchar_t string
size_t input_length = strlen(cStringFile);
size_t output_length = mbstowcs(NULL, stringText, input_length);
// Allocate memory for the wchar_t string
wchar_t *output = (wchar_t *)malloc((output_length + 1) * sizeof(wchar_t));
if (output == NULL) {
printf(&quot;Memory allocation failed.\n&quot;);
return 1;
}
// Convert the C string to wchar_t string
mbstowcs(output, cStringFile, input_length);
output[output_length] = L&#39;\0&#39;; // Add null-termination
unsigned long lenth = wcslen(output);
// Loop through each character in the Unicode text
for (int i = 0; i &lt; lenth; i++) {
// Write the Unicode character to the file
fwprintf(fd, L&quot;%lc&quot;, output[i]);
}
// Free the allocated memory
free(output);
&#39;; // Add null-termination unsigned long lenth = wcslen(output); // Loop through each character in the Unicode text for (int i = 0; i &lt; lenth; i++) { // Write the Unicode character to the file fwprintf(fd, L&quot;%lc&quot;, output[i]); } // Free the allocated memory free(output);

Now the issue is the multibyte characters are not being converted to the correct HEX value with the code above

Example 1) For this text = &quot;Tên tình bạn dưới tình yêu.mp3&quot;
Expected: 
T  ê  n     t  ì  n  h     b    ạ   n
54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

Actual: Wrong!
T   ê   n     t   ì   n  h     b   ạ     n
54 C3AA 6E 20 74 C3AC 6E 68 20 62 E1BAA1 6E ...... and so on

Example 2) For this text = &quot;最佳歌曲在这里.mp3&quot;
Expected: 
最-\u6700 佳-\u4F73 歌-\u6B4C 歌-\u66F2  曲-\u5728 
67 00     4F 73    6B 4C 	    66 F2     57 28  .....  

Actual: Wrong!
最        佳        歌        歌        曲
E6 9C     80 BD    B3 AD     8C 9B     B2 9C    

So I think it is writing 2 bytes in the case of 'ê' and 'ì' and 3 bytes in the case of 'ạ'. The code is not writing the Hex equivalent of the multibyte character.

What could be the issue?
Any help would be appreciated.

=====

I tried another approach not using wchar, checking if a character is a multibyte character and writing all bytes if true

    NSString *fileName = @&quot;Tên tình bạn dưới tình yêu.mp3&quot;;
    const char *stringText = [fileName UTF8String];
    unsigned long len = strlen(stringText);
    setlocale(LC_ALL, &quot;&quot;);
    for (char character = *stringText; character != &#39;
    NSString *fileName = @&quot;Tên tình bạn dưới tình yêu.mp3&quot;;
const char *stringText = [fileName UTF8String];
unsigned long len = strlen(stringText);
setlocale(LC_ALL, &quot;&quot;);
for (char character = *stringText; character != &#39;\0&#39;; character = *++stringText)
{
if (!character) {
continue;
}
putchar(character);
int byteCount = numberOfBytesInChar((unsigned char)character);
if (byteCount &lt;= 1) {
//putchar(character);
fprintf(fd, &quot;%c&quot;, character);
} else {
//putchar(character);
for(int k = 0; k &lt; byteCount; k++)
{
fprintf(fd, &quot;%c&quot;, character);
character = *++stringText;
}
}
}
int numberOfBytesInChar(unsigned char val) {
if (val &lt; 128) {
return 1;
} else if (val &lt; 224) {
return 2;
} else if (val &lt; 240) {
return 3;
} else {
return 4;
}
}
&#39;; character = *++stringText) { if (!character) { continue; } putchar(character); int byteCount = numberOfBytesInChar((unsigned char)character); if (byteCount &lt;= 1) { //putchar(character); fprintf(fd, &quot;%c&quot;, character); } else { //putchar(character); for(int k = 0; k &lt; byteCount; k++) { fprintf(fd, &quot;%c&quot;, character); character = *++stringText; } } } int numberOfBytesInChar(unsigned char val) { if (val &lt; 128) { return 1; } else if (val &lt; 224) { return 2; } else if (val &lt; 240) { return 3; } else { return 4; } }

Even now it is not writing the expected Hex equavalent for multibyte characters.

Example 1) For this text = &quot;Tên tình bạn dưới tình yêu.mp3&quot;
Expected: 
T  ê  n     t  ì  n  h     b    ạ   n
54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

Actual: Wrong!
T   ê   n     t   ì   n  h     b   ạ     n
54 C3AA 6E 20 74 C3AC 6E 68 20 62 E1BAA1 6E ...... and so on

Example 2) For this text = &quot;最佳歌曲在这里.mp3&quot;
Expected: 
最-\u6700 佳-\u4F73 歌-\u6B4C 歌-\u66F2  曲-\u5728 
67 00     4F 73    6B 4C 	    66 F2     57 28  .....  

Actual: Wrong!
最        佳        歌        歌        曲
E6 9C     80 BD    B3 AD     8C 9B     B2 9C     

Any pointers?

答案1

得分: 1

NSString 可以使用编码进行工作。

从字符串中提取数据并将其写入磁盘:

NSData *dataBE = [fileName dataUsingEncoding:NSUTF16BigEndianStringEncoding];
[dataBE writeToFile:@"/Users/user/Desktop/test" options:NSDataWritingAtomic error:&error];

或者将字符串写入磁盘:

[fileName writeToFile:@"/Users/user/Desktop/test" atomically:YES encoding:NSUTF16BigEndianStringEncoding error:&error];
英文:

NSString can work with encodings.

Extract the data from the string and write it to disk:

NSData *dataBE = [fileName dataUsingEncoding:NSUTF16BigEndianStringEncoding];
[dataBE writeToFile:@&quot;/Users/user/Desktop/test&quot; options:NSDataWritingAtomic error:&amp;error];

or write the string to disk:

[fileName writeToFile:@&quot;/Users/user/Desktop/test&quot; atomically:YES encoding:NSUTF16BigEndianStringEncoding error:&amp;error];

huangapple
  • 本文由 发表于 2023年6月29日 16:22:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76579270.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定