使用Golang与Google PubSub。轮询服务的最高效(成本)方式。

huangapple go评论89阅读模式
英文:

Using Google PubSub with Golang. The most efficient (cost) way to poll the service

问题

我们正在从AMQP迁移到Google的Pubsub。

文档建议,由于我们使用的是计算引擎,无法通过推送服务打开我们的工作程序来接收消息,因此拉取(pull)可能是我们的最佳选择。

文档还指出,拉取操作可能会根据使用情况产生额外的费用:

> 如果使用轮询,如果频繁打开连接并立即关闭连接,可能会产生高网络使用量。

我们创建了一个在循环中运行的Go测试订阅者,代码如下:

func main() {
    jsonKey, err := ioutil.ReadFile("pubsub-key.json")
    if err != nil {
        log.Fatal(err)
    }
    conf, err := google.JWTConfigFromJSON(
        jsonKey,
        pubsub.ScopeCloudPlatform,
        pubsub.ScopePubSub,
    )
    if err != nil {
        log.Fatal(err)
    }
    ctx := cloud.NewContext("xxx", conf.Client(oauth2.NoContext))

    msgIDs, err := pubsub.Publish(ctx, "topic1", &pubsub.Message{
        Data: []byte("hello world"),
    })

    if err != nil {
        log.Println(err)
    }

    log.Printf("Published a message with a message id: %s\n", msgIDs[0])

    for {
        msgs, err := pubsub.Pull(ctx, "subscription1", 1)
        if err != nil {
            log.Println(err)
        }

        if len(msgs) > 0 {
            log.Printf("New message arrived: %v, len: %d\n", msgs[0].ID, len(msgs))
            if err := pubsub.Ack(ctx, "subscription1", msgs[0].AckID); err != nil {
                log.Fatal(err)
            }
            log.Println("Acknowledged message")
            log.Printf("Message: %s", msgs[0].Data)
        }
    }
}

我想问的问题是,这是否是正确/推荐的拉取消息的方式。

我们每秒接收大约100条消息,我不确定在无限循环中运行是否会让我们破产,而且找不到其他合适的Go示例。

英文:

We're in the process of moving from AMQP to Google's Pubsub.

The docs suggest that pull might be the best choice for us since we're using compute engine and can't open our workers to receive via the push service.

It also says that pull might incur additional costs depending on usage:

> If polling is used, high network usage may be incurred if you are
> opening connections frequently and closing them immediately.

We'd created a test subscriber in go that runs in a loop as so:

func main() {
	jsonKey, err := ioutil.ReadFile("pubsub-key.json")
	if err != nil {
		log.Fatal(err)
	}
	conf, err := google.JWTConfigFromJSON(
		jsonKey,
		pubsub.ScopeCloudPlatform,
		pubsub.ScopePubSub,
	)
	if err != nil {
		log.Fatal(err)
	}
	ctx := cloud.NewContext("xxx", conf.Client(oauth2.NoContext))

	msgIDs, err := pubsub.Publish(ctx, "topic1", &pubsub.Message{
		Data: []byte("hello world"),
	})

	if err != nil {
		log.Println(err)
	}

	log.Printf("Published a message with a message id: %s\n", msgIDs[0])

	for {
		msgs, err := pubsub.Pull(ctx, "subscription1", 1)
		if err != nil {
			log.Println(err)
		}

		if len(msgs) > 0 {
			log.Printf("New message arrived: %v, len: %d\n", msgs[0].ID, len(msgs))
			if err := pubsub.Ack(ctx, "subscription1", msgs[0].AckID); err != nil {
				log.Fatal(err)
			}
			log.Println("Acknowledged message")
			log.Printf("Message: %s", msgs[0].Data)
		}
	}
}

The question I have though is really whether this is the correct / recommended way to go about pulling messages.

We recieve about 100msg per second throughout the day. I'm not sure if running it in an endless loop is going to bankrupt us and can't find any other decent go examples.

答案1

得分: 4

总的来说,在Cloud Pub/Sub中吸引订阅者的关键是确保始终有一些未完成的Pull请求,其中max_messages设置为适合以下情况的值:

  • 您发布消息的速率,
  • 这些消息的大小,以及
  • 您的订阅者可以处理消息的速率。

一旦一个Pull请求返回,您应该发出另一个请求。这意味着异步处理和确认在Pull响应中返回给您的消息(或异步启动新的Pull请求)。如果您发现吞吐量或延迟不符合预期,首先要做的是增加更多并发的Pull请求。

如果您的发布速率非常低,那么“如果使用轮询,如果您频繁打开连接并立即关闭连接,可能会产生高网络使用量”的说法适用。想象一下,您一天只发布两三条消息,但您不断使用Pull请求进行轮询。每个Pull请求都会产生请求的成本,但除了您实际上有消息的几次之外,您将不会收到任何要处理的消息,因此“每条消息的成本”相当高。如果您以相当稳定的速率发布消息,并且您的Pull请求返回非零数量的消息,则网络使用量和成本将与消息速率相符。

英文:

In general, the key to pull subscribers in Cloud Pub/Sub is to make sure you always have at least a few outstanding Pull requests with max_messages set to a value that works well for:

  • the rate at which you publish messages,
  • the size of those messages, and
  • the rate of messages your subscriber can process messages.

As soon as a pull request returns, you should issue another one. That means processing and acking the messages returned to you in the pull response asynchronously (or starting up the new pull request asynchronously). If you ever find that throughput or latency isn't what you expect, the first thing to do is add more concurrent pull requests.

The statement "if polling is used, high network usage may be incurred if you are opening connections frequently and closing them immediately" applies if your publish rate is extremely low. Imagine you only publish two or three messages in a day, but you constantly poll with pull requests. Every one of those pull requests incurs a cost for making the request, but you won't get any messages to process except for the few times when you actually have a message, so the "cost per message" is fairly high. If you are publishing at a pretty steady rate and your pull requests are returning a non-zero number of messages, then the network usage and costs will be in line with the message rate.

huangapple
  • 本文由 发表于 2016年4月11日 23:17:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/36552727.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定