创建一个用于测试文件访问的大型csv文件

huangapple go评论108阅读模式
英文:

Creating a large csv file for testing file access

问题

我想创建一个大小为10GB的文件,格式如下:

  1. 前缀:用户名:时间戳, 数字

例如:

  1. login:jbill:2013/3/25, 1

我想通过创建类似上面的随机行来创建一个10GB的文件。

在Go语言中,我该如何实现这个功能?

我可以有一个前缀的数组,如下:

  1. login, logout, register

还可以有一个用户名的数组:

  1. jbill, dkennedy
英文:

I want to create a 10 GB file that looks like:

  1. prefix:username:timestamp, number

So an example is like:

  1. login:jbill:2013/3/25, 1

I want to create a 10GB file, by creating random rows like the one above.

How could I do this in Go?

I can have an array of prefixes like:

  1. login, logout, register

And also an array of usernames:

  1. jbill, dkennedy

答案1

得分: 5

例如,

  1. package main
  2. import (
  3. "bufio"
  4. "fmt"
  5. "math/rand"
  6. "os"
  7. "strconv"
  8. "time"
  9. )
  10. func main() {
  11. fileSize := int64(10e9) // 10GB
  12. f, err := os.Create("/tmp/largefile")
  13. if err != nil {
  14. fmt.Println(err)
  15. return
  16. }
  17. w := bufio.NewWriter(f)
  18. prefixes := []string{"login", "logout", "register"}
  19. names := []string{"jbill", "dkennedy"}
  20. timeStart := time.Date(2012, 1, 1, 0, 0, 0, 0, time.UTC)
  21. timeDur := timeStart.AddDate(1, 0, 0).Sub(timeStart)
  22. rand.Seed(time.Now().UnixNano())
  23. size := int64(0)
  24. for size < fileSize {
  25. // prefix:username:timestamp, number
  26. // login:jbill:2012/3/25, 1
  27. prefix := prefixes[int(rand.Int31n(int32(len(prefixes))))]
  28. name := names[int(rand.Int31n(int32(len(names))))]
  29. time := timeStart.Add(time.Duration(rand.Int63n(int64(timeDur)))).Format("2006/1/2")
  30. number := strconv.Itoa(int(rand.Int31n(100) + 1))
  31. line := prefix + ":" + name + ":" + time + ", " + number + "\n"
  32. n, err := w.WriteString(line)
  33. if err != nil {
  34. fmt.Println(n, err)
  35. return
  36. }
  37. size += int64(len(line))
  38. }
  39. err = w.Flush()
  40. if err != nil {
  41. fmt.Println(err)
  42. return
  43. }
  44. err = f.Close()
  45. if err != nil {
  46. fmt.Println(err)
  47. return
  48. }
  49. fmt.Println("Size:", size)
  50. }

输出:

  1. register:jbill:2012/8/24, 15
  2. login:jbill:2012/10/7, 98
  3. register:dkennedy:2012/8/29, 70
  4. register:jbill:2012/6/1, 89
  5. register:jbill:2012/5/24, 63
  6. login:dkennedy:2012/3/29, 48
  7. logout:jbill:2012/7/8, 93
  8. logout:dkennedy:2012/1/12, 74
  9. login:jbill:2012/4/12, 14
  10. login:jbill:2012/2/5, 83
英文:

For example,

  1. package main
  2. import (
  3. "bufio"
  4. "fmt"
  5. "math/rand"
  6. "os"
  7. "strconv"
  8. "time"
  9. )
  10. func main() {
  11. fileSize := int64(10e9) // 10GB
  12. f, err := os.Create("/tmp/largefile")
  13. if err != nil {
  14. fmt.Println(err)
  15. return
  16. }
  17. w := bufio.NewWriter(f)
  18. prefixes := []string{"login", "logout", "register"}
  19. names := []string{"jbill", "dkennedy"}
  20. timeStart := time.Date(2012, 1, 1, 0, 0, 0, 0, time.UTC)
  21. timeDur := timeStart.AddDate(1, 0, 0).Sub(timeStart)
  22. rand.Seed(time.Now().UnixNano())
  23. size := int64(0)
  24. for size < fileSize {
  25. // prefix:username:timestamp, number
  26. // login:jbill:2012/3/25, 1
  27. prefix := prefixes[int(rand.Int31n(int32(len(prefixes))))]
  28. name := names[int(rand.Int31n(int32(len(names))))]
  29. time := timeStart.Add(time.Duration(rand.Int63n(int64(timeDur)))).Format("2006/1/2")
  30. number := strconv.Itoa(int(rand.Int31n(100) + 1))
  31. line := prefix + ":" + name + ":" + time + ", " + number + "\n"
  32. n, err := w.WriteString(line)
  33. if err != nil {
  34. fmt.Println(n, err)
  35. return
  36. }
  37. size += int64(len(line))
  38. }
  39. err = w.Flush()
  40. if err != nil {
  41. fmt.Println(err)
  42. return
  43. }
  44. err = f.Close()
  45. if err != nil {
  46. fmt.Println(err)
  47. return
  48. }
  49. fmt.Println("Size:", size)
  50. }

Output:

  1. register:jbill:2012/8/24, 15
  2. login:jbill:2012/10/7, 98
  3. register:dkennedy:2012/8/29, 70
  4. register:jbill:2012/6/1, 89
  5. register:jbill:2012/5/24, 63
  6. login:dkennedy:2012/3/29, 48
  7. logout:jbill:2012/7/8, 93
  8. logout:dkennedy:2012/1/12, 74
  9. login:jbill:2012/4/12, 14
  10. login:jbill:2012/2/5, 83

答案2

得分: 4

这是一个天真的方法(1GB):

  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "os"
  6. )
  7. func main() {
  8. myfile, err := os.OpenFile("myfile", os.O_WRONLY|os.O_CREATE, 0644)
  9. if err != nil {
  10. log.Fatal(err)
  11. }
  12. defer myfile.Close()
  13. var pos int
  14. var line string
  15. // sample: login:jbill:2013/3/25, 1
  16. line = fmt.Sprintf("%s:%s:%s, %d\n", "login", "jbill", "2013/3/25", 1)
  17. for pos < 1024*1024*1024 {
  18. bytes, err := myfile.Write([]byte(line))
  19. if err != nil {
  20. log.Fatal(err)
  21. }
  22. pos = pos + bytes
  23. }
  24. }

这需要很长时间(1:16),因为输出没有缓冲。通过添加bufio,您可以大大减少时间

  1. package main
  2. import (
  3. "bufio"
  4. "fmt"
  5. "log"
  6. "os"
  7. )
  8. func main() {
  9. myfile, err := os.OpenFile("myfile", os.O_WRONLY|os.O_CREATE, 0644)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. defer myfile.Close()
  14. mybufferedfile := bufio.NewWriter(myfile)
  15. var pos int
  16. var line string
  17. // sample: login:jbill:2013/3/25, 1
  18. line = fmt.Sprintf("%s:%s:%s, %d\n", "login", "jbill", "2013/3/25", 1)
  19. for pos < 1024*1024*1024 {
  20. bytes, err := mybufferedfile.WriteString(line)
  21. if err != nil {
  22. log.Fatal(err)
  23. }
  24. pos = pos + bytes
  25. }
  26. err = mybufferedfile.Flush()
  27. if err != nil {
  28. log.Fatal(err)
  29. }
  30. }

在我的机器上仍然需要26秒,我希望看到一个更快的解决方案。

顺便说一下:您需要做随机字段,但这留给读者作为练习 创建一个用于测试文件访问的大型csv文件

英文:

This is a naive approach (1GB):

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;os&quot;
  6. )
  7. func main() {
  8. myfile, err := os.OpenFile(&quot;myfile&quot;, os.O_WRONLY|os.O_CREATE, 0644)
  9. if err != nil {
  10. log.Fatal(err)
  11. }
  12. defer myfile.Close()
  13. var pos int
  14. var line string
  15. // sample: login:jbill:2013/3/25, 1
  16. line = fmt.Sprintf(&quot;%s:%s:%s, %d\n&quot;, &quot;login&quot;, &quot;jbill&quot;, &quot;2013/3/25&quot;, 1)
  17. for pos &lt; 1024*1024*1024 {
  18. bytes, err := myfile.Write([]byte(line))
  19. if err != nil {
  20. log.Fatal(err)
  21. }
  22. pos = pos + bytes
  23. }
  24. }

which takes forever (1:16), because the output is not buffered. By adding bufio you can decrease the time dramatically

  1. package main
  2. import (
  3. &quot;bufio&quot;
  4. &quot;fmt&quot;
  5. &quot;log&quot;
  6. &quot;os&quot;
  7. )
  8. func main() {
  9. myfile, err := os.OpenFile(&quot;myfile&quot;, os.O_WRONLY|os.O_CREATE, 0644)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. defer myfile.Close()
  14. mybufferedfile := bufio.NewWriter(myfile)
  15. var pos int
  16. var line string
  17. // sample: login:jbill:2013/3/25, 1
  18. line = fmt.Sprintf(&quot;%s:%s:%s, %d\n&quot;, &quot;login&quot;, &quot;jbill&quot;, &quot;2013/3/25&quot;, 1)
  19. for pos &lt; 1024*1024*1024 {
  20. bytes, err := mybufferedfile.WriteString(line)
  21. if err != nil {
  22. log.Fatal(err)
  23. }
  24. pos = pos + bytes
  25. }
  26. err = mybufferedfile.Flush()
  27. if err != nil {
  28. log.Fatal(err)
  29. }
  30. }

Still 26 sec on my machine, I'd like to see a faster solution.

BTW: you need to do the random fileds, but that is left as an exercise to the reader 创建一个用于测试文件访问的大型csv文件

huangapple
  • 本文由 发表于 2013年3月25日 23:55:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/15619225.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定