更快的输入扫描

huangapple go评论78阅读模式
英文:

Faster input scanning

问题

我正在尝试解决SPOJ问题,可以在这里找到。

以下是我的解决方案:

package main

import "fmt"
import "bufio"
import "os"

func main() {
    var n, k int
    var num int
    var divisible int

    in := bufio.NewReader(os.Stdin)

    fmt.Fscan(in, &n)
    fmt.Fscan(in, &k)

    for n > 0 {
        fmt.Fscan(in, &num)

        if num%k == 0 {
            divisible++
        }

        n--
    }

    fmt.Println(divisible)
}

代码运行正常。问题在于当我在SPOJ上执行时,会出现超时的问题。

我最初只使用了fmt.Scan,但后来我看到了这个帖子,建议我改用bufio来进行更快的输入扫描。

但我仍然遇到超时问题。我只是在循环中获取所有的输入,并在循环内部确定输入是否可被整除。所以,我认为问题不在于循环,而是在于输入扫描上花费的时间。我该如何改进以更快地读取输入?或者问题出在其他地方吗?

英文:

I am attempting to solve the SPOJ question that can be found here

Following is my solution:

package main

import "fmt"
import "bufio"
import "os"

func main() {
	var n, k int
	var num int
	var divisible int

	in := bufio.NewReader(os.Stdin)

	fmt.Fscan(in, &n)
	fmt.Fscan(in, &k)

	for n > 0 {
		fmt.Fscan(in, &num)

		if num%k == 0 {
			divisible++
		}

		n--
	}

	fmt.Println(divisible)
}

The code works fine. The issue here is that I get a timeout when I execute it in SPOJ.

I was first using only fmt.Scan but I then came across this thread that suggested that I use bufio instead for faster input scanning.

But I still get a timeout issue. I am only looping to get all the inputs and within this loop itself I determine whether the input is divisible or not. So, I believe that its not the loop but the input scanning that's taking time. How can I improve this to read the input faster? Or is the issue somewhere else?

答案1

得分: 16

你可以使用bufio.Scanner从输入中读取行。

由于我们总是读取数字,我们可以创建一个高度优化的转换器来获取数字。我们应该避免使用Scanner.Text(),它会创建一个string,而我们可以直接从Scanner.Bytes()返回的原始字节中获取数字。Scanner.Text()返回与Scanner.Bytes()相同的标记,但它首先转换为string,显然更慢,并且会生成“垃圾”并给垃圾回收器带来额外的工作。

所以这里有一个从原始字节中获取int的转换器函数:

func toInt(buf []byte) (n int) {
    for _, v := range buf {
        n = n*10 + int(v-'0')
    }
    return
}

这个toInt()函数之所以有效,是因为[]byte包含数字的十进制表示的UTF-8编码字节序列,其中只包含'0'..'9'范围内的数字,其UTF-8编码字节是一对一映射的(一个字节用于一个数字)。从数字到字节的映射只是一个偏移:'0' -> 48'1' -> 49等等。

使用这个转换器的完整应用程序如下:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    var n, k, c int
    scanner := bufio.NewScanner(os.Stdin)

    scanner.Scan()
    fmt.Sscanf(scanner.Text(), "%d %d", &n, &k)

    for ; n > 0; n-- {
        scanner.Scan()
        if toInt(scanner.Bytes())%k == 0 {
            c++
        }
    }

    fmt.Println(c)
}

func toInt(buf []byte) (n int) {
    for _, v := range buf {
        n = n*10 + int(v-'0')
    }
    return
}

这个解决方案比例如调用strconv.Atoi()要快大约4倍。

注意:

在上面的解决方案中,我假设输入是有效的,即它总是包含有效的数字,并且在第一行之后至少包含n行(给出了nk)。

如果输入在第n+1行后关闭,我们可以使用简化的for循环(我们甚至不需要递减并依赖于n):

for scanner.Scan() {
    if toInt(scanner.Bytes())%k == 0 {
        c++
    }
}
英文:

You can use bufio.Scanner to read lines from the input.

And since we're always reading numbers, we can create a highly optimized converter to get the number. We should avoid using Scanner.Text() which creates a string as we can obtain the number just from the raw bytes returned by Scanner.Bytes(). Scanner.Text() returns the same token as Scanner.Bytes() but it first converts to string which is obviously slower and generates "garbage" and work for the gc.

So here is a converter function which obtains an int from the raw bytes:

func toInt(buf []byte) (n int) {
	for _, v := range buf {
		n = n*10 + int(v-'0')
	}
	return
}

This toInt() works because the []byte contains the UTF-8 encoded byte sequence of the string representation of the decimal format of the number, which contains only digits in the range of '0'..'9' whose UTF-8 encoded bytes are mapped one-to-one (one byte is used for one digit). The mapping from digit to byte is simply a shift: '0' -> 48, '1' -> 49 etc.

Using this your complete application:

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	var n, k, c int
	scanner := bufio.NewScanner(os.Stdin)

	scanner.Scan()
	fmt.Sscanf(scanner.Text(), "%d %d", &n, &k)

	for ;n > 0; n-- {
		scanner.Scan()
		if toInt(scanner.Bytes())%k == 0 {
			c++
		}
	}

	fmt.Println(c)
}

func toInt(buf []byte) (n int) {
	for _, v := range buf {
		n = n*10 + int(v-'0')
	}
	return
}

This solution is about 4 times faster than calling strconv.Atoi() for example.

Notes:

In the above solution I assumed input is valid, that is it always contains valid numbers and contains at least n lines after the first (which gives us n and k).

If the input is closed after n+1 lines, we can use a simplified for (and we don't even need to decrement and rely on n):

for scanner.Scan() {
	if toInt(scanner.Bytes())%k == 0 {
		c++
	}
}

答案2

得分: 2

尝试使用bufio.Scanner(如你提到的线程中建议的):

fmt.Scan(&n)
fmt.Scan(&k)

scanner := bufio.NewScanner(os.Stdin)
for n > 0 {
    scanner.Scan()
    k, _ := strconv.Atoi(scanner.Text())
    ...
}
英文:

Try using bufio.Scanner (as suggested in the thread you mentioned):

fmt.Scan(&n)
fmt.Scan(&k)

scanner := bufio.NewScanner(os.Stdin)
for n > 0 {
    scanner.Scan()
    k, _ := strconv.Atoi(scanner.Text())
    ...

答案3

得分: 1

我编写了3个版本来进行比较。
第一个版本使用fmt.Scanf("%d", &v),第二个版本使用字节转换数字(像@icza一样),第三个版本使用strconv.Atoi进行转换。为了使用这些函数,我初始化了扫描器的方式如下:

scanner := bufio.NewScanner(os.Stdin)
scanner.Split(bufio.ScanWords)

因此,每次调用scanner.Scan,它都会返回一个由空格分隔的标记。
以下是这些函数的实现:

func scanFromBytes(scanner *bufio.Scanner) (n int) {
    scanner.Scan()

    buf := scanner.Bytes()
    for _, v := range buf {
        n = n*10 + int(v-'0')
    }

    return
}

和:

func scanAtoi(scanner *bufio.Scanner) (n int) {
    scanner.Scan()

    n, _ = strconv.Atoi(scanner.Text())

    return
}

我已经使用一个大文件(40k个测试),每个测试读取大约8个整数。
fmt.Scanf的解决方案大约需要1.9秒,这是预期的(比其他解决方案更长)。
在这两个函数中,我得到了大约0.8秒的时间。但是scanAtoi总是比scanFromBytes快大约0.05秒,除了第一次(可能发生了一些缓存)。

英文:

I coded 3 versions to compare them.
The first using fmt.Scanf("%d", &v), the second converting numbers from bytes (like @icza), and the third converting using strconv.Atoi. To use the functions I initializated scanner in that way:

scanner := bufio.NewScanner(os.Stdin)
scanner.Split(bufio.ScanWords)

So, every time I call scanner.Scan, it returns a token splitted by spaces.
And the functions follow:

func scanFromBytes(scanner *bufio.Scanner) (n int) {
    scanner.Scan()

    buf := scanner.Bytes()
    for _, v := range buf {
        n = n*10 + int(v-'0')
    }

    return
}

And:

func scanAtoi(scanner *bufio.Scanner) (n int) {
    scanner.Scan()

    n, _ = strconv.Atoi(scanner.Text())

    return
}

I have tested with a big file (40k tests), reading about 8 integers per test.
The fmt.Scanf solution, takes about 1.9s, as expected (more than the others).
In the two functions I got about 0.8s. But the scanAtoi always takes about 0.05s less than scanFromBytes, except for the very first time (maybe some caching occurs).

huangapple
  • 本文由 发表于 2015年7月10日 13:49:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/31333353.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定