Creating a zip archive with Unicode filenames using Go's archive/zip

huangapple go评论145阅读模式
英文:

Creating a zip archive with Unicode filenames using Go's archive/zip

问题

package main

import (
	"archive/zip"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strings"
)

func main() {
	var (
		Path = os.Args[1]
		Name = os.Args[2]
	)

	File, _ := os.Create(Name)
	PS := strings.Split(Path, "\\")
	PathName := strings.Join(PS[:len(PS)-1], "\\")
	os.Chdir(PathName)
	Path = PS[len(PS)-1]
	defer File.Close()
	Zip := zip.NewWriter(File)
	defer Zip.Close()
	walk := func(Path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Println(err)
			return err
		}
		if info.IsDir() {
			return nil
		}
		Src, _ := os.Open(Path)
		defer Src.Close()
		fmt.Println(Path)
		FileName, _ := Zip.Create(Path)
		io.Copy(FileName, Src)
		Zip.Flush()
		return nil
	}
	if err := filepath.Walk(Path, walk); err != nil {
		fmt.Println(err)
	}
}

这是一个用于创建zip文件的Go语言代码。当使用包含中文的目录路径时,中文会显示乱码。你想要解决这个问题,对吗?

英文:
package main

import (
	"archive/zip"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strings"
)

func main() {
	var (
		Path = os.Args[1]
		Name = os.Args[2]
	)

	File, _ := os.Create(Name)
	PS := strings.Split(Path, "\\")
	PathName := strings.Join(PS[:len(PS)-1], "\\")
	os.Chdir(PathName)
	Path = PS[len(PS)-1]
	defer File.Close()
	Zip := zip.NewWriter(File)
	defer Zip.Close()
	walk := func(Path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Println(err)
			return err
		}
		if info.IsDir() {
			return nil
		}
		Src, _ := os.Open(Path)
		defer Src.Close()
		fmt.Println(Path)
		FileName, _ := Zip.Create(Path)
		io.Copy(FileName, Src)
		Zip.Flush()
		return nil
	}
	if err := filepath.Walk(Path, walk); err != nil {
		fmt.Println(err)
	}
}

This mydir path :

-----root
    |---2015-05(dir)
         |---中文.go
    |---package(dir)
    |---你好.go

When I use this code directory, Chinese will be garbled. Who can help me solve the problem.

答案1

得分: 11

问题是,默认情况下,Zip规范只允许在zip条目名称中使用ASCII字符,更具体地说:(来源:附录D

> 附录D.1 ZIP格式历史上只支持原始的IBM PC字符编码集,通常称为IBM Code Page 437。这限制了只能存储原始MS-DOS值范围内的文件名字符,并且不正确地支持其他字符编码或语言的文件名。为了解决这个限制,该规范将支持以下更改。

后来添加了对Unicode名称的支持。这可以通过一个特殊的位标记,称为“通用目的位11”或“语言编码标志(EFS)”来标记:

> 第4.4.4节 - 通用目的位标志 - 位11 - 语言编码标志(EFS)。如果设置了此位,该文件的文件名和注释字段必须使用UTF-8进行编码。

> 附录D.2 如果通用目的位11未设置,则文件名和注释应符合原始的ZIP字符编码。如果设置了通用目的位11,则文件名和注释必须支持Unicode标准4.1.0或更高版本,并使用UTF-8存储规范定义的字符编码形式。Unicode标准由Unicode联盟(www.unicode.org)发布。存储在ZIP文件中的UTF-8编码数据不应包含字节顺序标记(BOM)。

通用目的位标志在Go中是存在且受支持的:它是FileHeader结构体的Flags字段。不幸的是,Go没有设置此位的方法,并且默认情况下它为0。

因此,为了添加对Unicode名称的支持,最简单的方法是将位11设置为1。而不是:

FileName, _ := Zip.Create(Path)

请使用以下代码开始您的zip条目:

h := &zip.FileHeader{Name:Path, Method: zip.Deflate, Flags: 0x800}
FileName, _ := Zip.CreateHeader(h)

第一行创建了一个FileHeader,其中Flags字段设置为0x800位11),告诉它文件名将使用UTF-8进行编码(当Go将string写入io.Writer时,它会使用UTF-8进行编码)。

注意:

通过这样做,UTF-8文件名将被保留,但并非所有的zip阅读器/提取器都支持它。例如,在Windows上,Windows资源管理器将不会将其解码为UTF-8,但是例如更高级的Zip处理程序(例如SecureZip)将会看到UTF-8文件名,并正确提取文件名(使用UTF-8解码)。

英文:

The problem is that by default in zip entry names only the ASCII characters are allowed by the Zip specification, more specifically: (Source: APPENDIX D)

> APPENDIX D.1 The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437. This limits storing
file name characters to only those within the original MS-DOS range of values
and does not properly support file names in other character encodings, or
languages. To address this limitation, this specification will support the
following change.

Later support for Unicode names has been added. This can be marked with a special bit referred to as general purpose bit 11, also called Language encoding flag (EFS):

> Section 4.4.4 - General purpose bit flag - Bit 11 - Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8.

> APPENDIX D.2 If general purpose bit 11 is unset, the file name and comment should conform
to the original ZIP character encoding. If general purpose bit 11 is set, the
filename and comment must support The Unicode Standard, Version 4.1.0 or
greater using the character encoding form defined by the UTF-8 storage
specification. The Unicode Standard is published by the The Unicode
Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
is expected to not include a byte order mark (BOM).

The general purpose bit flag is present and supported by Go: it is the Flags field of the FileHeader struct. Unfortunately Go doesn't have methods to set this bit, and by default it is 0.

So the easiest way to add support for Unicode names is to simply set bit 11 to one. Instead of

FileName, _ := Zip.Create(Path)

Start your zip entry with:

h := &zip.FileHeader{Name:Path, Method: zip.Deflate, Flags: 0x800}
FileName, _ := Zip.CreateHeader(h)

The first line creates a FileHeader in which 0x800 (bit 11) value is set for the Flags field which tells that the file name will be encoded using UTF-8 (which is what Go does when it writes a string to an io.Writer).

Note:

By doing this, UTF-8 filenames will be preserved, but not all zip reader/extractor supports it. For example on Windows, the windows file handler, the Windows Explorer will not decode it as UTF-8, but for example a more serious Zip handler (e.g. SecureZip) will see the UTF-8 file names and will extract the file names properly (using UTF-8 decoding).

答案2

得分: -2

package main

import (
	"archive/zip"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strings"
)

func main() {
	var (
		Path = os.Args[1]
		Name = os.Args[2]
	)

	File, _ := os.Create(Name)
	PS := strings.Split(Path, "\\")
	PathName := strings.Join(PS[:len(PS)-1], "\\")
	os.Chdir(PathName)
	Path = PS[len(PS)-1]
	defer File.Close()
	Zip := zip.NewWriter(File)
	defer Zip.Close()
	walk := func(Path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Println(err)
			return err
		}
		if info.IsDir() {
			return nil
		}
		Src, _ := os.Open(Path)
		defer Src.Close()
		//FileName, _ := Zip.Create(Path)
		h := &zip.FileHeader{Name: Path, Method: zip.Deflate, Flags: 0x800}
		FileName, _ := Zip.CreateHeader(h)
		io.Copy(FileName, Src)
		Zip.Flush()
		return nil
	}
	if err := filepath.Walk(Path, walk); err != nil {
		fmt.Println(err)
	}
}

这是一个用于创建 zip 压缩文件的 Go 语言程序。它会将指定路径下的文件和文件夹压缩到一个指定的 zip 文件中。你可以将这段代码保存为一个 Go 源文件,然后使用 Go 编译器进行编译和执行。

英文:
package main

import (
	"archive/zip"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strings"
)

func main() {
	var (
		Path = os.Args[1]
		Name = os.Args[2]
	)

	File, _ := os.Create(Name)
	PS := strings.Split(Path, "\\")
	PathName := strings.Join(PS[:len(PS)-1], "\\")
	os.Chdir(PathName)
	Path = PS[len(PS)-1]
	defer File.Close()
	Zip := zip.NewWriter(File)
	defer Zip.Close()
	walk := func(Path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Println(err)
			return err
		}
		if info.IsDir() {
			return nil
		}
		Src, _ := os.Open(Path)
		defer Src.Close()
		//FileName, _ := Zip.Create(Path)
		h := &zip.FileHeader{Name: Path, Method: zip.Deflate, Flags: 0x800}
		FileName, _ := Zip.CreateHeader(h)
		io.Copy(FileName, Src)
		Zip.Flush()
		return nil
	}
	if err := filepath.Walk(Path, walk); err != nil {
		fmt.Println(err)
	}
}

huangapple
  • 本文由 发表于 2015年5月4日 16:49:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/30026083.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定