写入gzip格式的数据的正确方法是什么?

huangapple go评论91阅读模式
英文:

What is a correct way of writing data in gzip format?

问题

我的应用程序生成了大量的文本数据,为了减少磁盘消耗,我想以gzip格式写入数据。

许多goroutine同时调用WriteData()函数。

但是Linux的gzip抱怨文件损坏。

zcat ./2021-08-11-00.gz > /dev/null
gzip: ./2021-08-11-00.gz: invalid compressed data--format violated

这种情况并不是每次都发生,但大约每两到三个写入的文件中会发生一次。

我的代码有什么问题?

我的DataWrite包看起来像这样:

package storage

import (
    "compress/gzip"
    "os"
    "sync"

    "github.com/rs/zerolog/log"
)

type Storage struct {
    handle *os.File
    writer *gzip.Writer

    lock sync.Mutex
}

func (s *Storage) Init(filename string) error {

    file, err := os.OpenFile(filename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)

    if err != nil {
        return err
    }

    s.handle = file
    s.writer = gzip.NewWriter(file)

    return nil
}

func (s *Storage) Shutdown() {

    if err := s.writer.Close(); err != nil {
        log.Warn().Err(err).Msg("Gzip writer close failed")
    }

    if err := s.handle.Close(); err != nil {
        log.Warn().Err(err).Msg("Gzip handle close failed")
    }
}

func (s *Storage) WriteData(data *MyStruct) error {

    s.lock.Lock()
    defer s.lock.Unlock()

    buffer := data.content

    _, err := s.writer.Write([]byte(buffer))

    if err != nil {
        log.Warn().Err(err).Msg("Gzip write failed")
        return err
    }

    if err := s.writer.Flush(); err != nil {
        return err
    }

    if err := s.handle.Sync(); err != nil {
        return err
    }

    return nil
}
英文:

My application produces a lot of text data, to reduce disk consumption I want to write data in gzip format

Many goroutines simultaneously call WriteData() function.

But linux gzip complains about corrupted file.

zcat ./2021-08-11-00.gz > /dev/null
gzip: ./2021-08-11-00.gz: invalid compressed data--format violated

It happend not every time, but about every second-trird writed file.

What is wrong with my code?

My DataWrite package looks like

package storage

import (
	"compress/gzip"
	"os"
	"sync"

	"github.com/rs/zerolog/log"
)

type Storage struct {
	handle *os.File
	writer *gzip.Writer

	lock sync.Mutex
}

func (s *Storage) Init(filename string) error {

	file, err := os.OpenFile(filename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)

	if err != nil {
		return err
	}

	s.handle = file
	s.writer = gzip.NewWriter(file)

	return nil
}

func (s *Storage) Shutdown() {

	if err := s.writer.Close(); err != nil {
		log.Warn().Err(err).Msg("Gzip writer close failed")
	}

	if err := s.handle.Close(); err != nil {
		log.Warn().Err(err).Msg("Gzip handle close failed")
	}
}

func (s *Storage) WriteData(data *MyStruct) error {

	s.lock.Lock()
	defer s.lock.Unlock()

	buffer := data.content

	_, err := s.writer.Write([]byte(buffer))

	if err != nil {
		log.Warn().Err(err).Msg("Gzip write failed")
		return err
	}

	if err := s.writer.Flush(); err != nil {
		return err
	}

	if err := s.handle.Sync(); err != nil {
		return err
	}

	return nil
}

答案1

得分: 1

你没有同步关闭和写入操作。

package storage

type Storage struct {
    handle *os.File
    writer *gzip.Writer

    lock sync.Mutex
}

func (s *Storage) Init(filename string) {

    file, err := os.OpenFile(filename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)

    if err != nil {
        return err
    }

    s.handle = file
    s.writer = gzip.NewWriter(file)

}

func (s *Storage) Shutdown() {

    s.lock.Lock() // 这里!!
    defer s.lock.Unlock()

    if err := s.writer.Close(); err != nil {
        log.Warn().Err(err).Str("fileName", path).Msg("Gzip writer close failed")
    }

    if err := s.handle.Close(); err != nil {
        log.Warn().Err(err).Str("fileName", path).Msg("Gzip handle close failed")
    }
}

func (s *Storage) WriteData(data *MyStruct) error {

    s.lock.Lock()
    defer s.lock.Unlock()

    cnt, err := s.writer.Write([]byte(buffer))

    if err != nil {
        log.Warn().Err(err).Msg("Gzip write failed")
        return err
    }

    if err := s.writer.Flush(); err != nil {
        return err
    }

    if err := s.handle.Sync(); err != nil {
        return err
    }

    return nil
}
英文:

You are not synchronizing Shutdown and Write.

package storage

type Storage struct {
    handle *os.File
    writer *gzip.Writer

    lock sync.Mutex
}

func (s *Storage) Init(filename string) {

    file, err := os.OpenFile(filename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)

    if err != nil {
        return err
    }

    s.handle = file
    s.writer = gzip.NewWriter(file)

}

func (s *Storage) Shutdown() {

    s.lock.Lock() // Here !!
    defer s.lock.Unlock()

    if err := s.writer.Close(); err != nil {
        log.Warn().Err(err).Str("fileName", path).Msg("Gzip writer close failed")
    }

    if err := s.handle.Close(); err != nil {
        log.Warn().Err(err).Str("fileName", path).Msg("Gzip handle close failed")
    }
}

func (s *Storage) WriteData(data *MyStruct) error {

    s.lock.Lock()
    defer s.lock.Unlock()

    cnt, err := s.writer.Write([]byte(buffer))

    if err != nil {
        log.Warn().Err(err).Msg("Gzip write failed")
        return err
    }

    if err := s.writer.Flush(); err != nil {
        return err
    }

    if err := s.handle.Sync(); err != nil {
        return err
    }

    return nil
}

答案2

得分: -2

以下是gzip压缩的工作代码:

package main

import (
	"compress/gzip"
	"log"
	"os"
	"time"
	"sync"
)

type Storage struct {
	handle   *os.File
	writer   *gzip.Writer
	buffer   []byte
	lock     sync.Mutex
	Name     string
	Comment  string
	ModTime  time.Time
}

func (s *Storage) Init(filename string) {
	file, err := os.OpenFile(filename, os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		log.Fatal(err)
	}

	s.handle = file
	s.writer = gzip.NewWriter(file)
	s.Name = "a-new-hope.txt"
	s.Comment = "an epic space opera by George Lucas"
	s.ModTime = time.Date(1977, time.May, 25, 0, 0, 0, 0, time.UTC)
	s.buffer = []byte("Hello")
}

func (s *Storage) Shutdown() {
	if err := s.writer.Close(); err != nil {
		log.Fatal("Gzip writer close failed")
	}

	if err := s.handle.Close(); err != nil {
		log.Fatal("Gzip writer close failed")
	}
}

func (s *Storage) WriteData() error {
	s.lock.Lock()
	defer s.lock.Unlock()

	_, err := s.writer.Write([]byte(s.buffer))

	if err != nil {
		log.Fatal("Gzip write failed")
		return err
	}

	if err := s.writer.Flush(); err != nil {
		return err
	}

	if err := s.handle.Sync(); err != nil {
		return err
	}

	return nil
}

func main() {
	s := Storage{}
	s.Init("sss.gzip")
	s.WriteData()
	s.Shutdown()
}

编辑
进行了修改,使其与问题中的代码类似,只做了一些小的更改。WriteData从Storage结构中获取缓冲区,因为代码中没有MyStruct。

英文:

Here you can see the below working code for gzip compress:

package main
import (
"compress/gzip"
"log"
"os"
"time"
"sync"
)
type Storage struct {
handle *os.File
writer *gzip.Writer
buffer []byte
lock sync.Mutex
Name string
Comment string
ModTime time.Time
}
func (s *Storage) Init(filename string) {
file, err := os.OpenFile(filename, os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
s.handle = file
s.writer = gzip.NewWriter(file)
s.Name = "a-new-hope.txt"
s.Comment = "an epic space opera by George Lucas"
s.ModTime = time.Date(1977, time.May, 25, 0, 0, 0, 0, time.UTC)
s.buffer = []byte("Hello")
}
func (s *Storage) Shutdown() {
if err := s.writer.Close(); err != nil {
log.Fatal("Gzip writer close failed")
}
if err := s.handle.Close(); err != nil {
log.Fatal("Gzip writer close failed")
}
}
func (s *Storage) WriteData() error {
s.lock.Lock()
defer s.lock.Unlock()
_, err := s.writer.Write([]byte(s.buffer))
if err != nil {
log.Fatal("Gzip write failed")
return err
}
if err := s.writer.Flush(); err != nil {
return err
}
if err := s.handle.Sync(); err != nil {
return err
}
return nil
}
func main() {
//WriteGzip("test.gzip", "My data")
s := Storage{};
s.Init("sss.gzip");
s.WriteData();
s.Shutdown();
}

EDIT
Modified to make it similar than the code in question with little changes. WriteData taking buffer from Storage struct as MyStruct is not in the code.

huangapple
  • 本文由 发表于 2021年8月11日 11:39:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/68735720.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定