英文:
Use Google Go's Goroutines To Create A Bayes Network
问题
我有一个大型的哲学论证数据集,每个论证都与其他论证连接,作为给定陈述的证明或反驳。一个根陈述可以有许多证明和反驳,每个证明和反驳也可以有证明和反驳。陈述还可以在多个图中使用,并且可以在“给定上下文”或假设下进行分析。
我需要构建一个相关论证的贝叶斯网络,以便每个节点公平准确地传播影响到其连接的论证;我需要能够同时计算连接节点链的概率,每个节点都需要进行数据存储查找以获取结果;该过程主要受I/O限制,并且我的数据存储连接可以在Java、Go和Python(Google App Engine)中异步运行。一旦每个查找完成,它就会将效果传播到所有其他连接的节点,直到概率变化低于不相关的阈值(目前为0.1%)。该过程的每个节点必须计算连接链,然后将所有查询结果总结起来以调整有效性结果,并将结果向外传递到任何连接的论证。
为了避免无限循环,我考虑使用类似A*的过程在goroutines中传播更新到论证映射中,启发式基于复合影响力,一旦影响力的概率下降到0.1%以下,就会忽略节点。我尝试使用SQL触发器设置计算,但它变得非常复杂和混乱。然后我转移到Google App Engine以利用异步NoSQL,效果好一些,但仍然太慢。我需要快速运行更新以获得灵敏的用户界面,因此当用户创建、投票赞成或反对证明或反驳时,他们可以立即在界面上看到结果反映出来。
我认为Go是支持我所需并发性的语言选择,但我也愿意听取建议。客户端是一个单体JavaScript应用程序,只使用XHR和WebSockets实时推送和拉取论证映射(及其更新)。我有一个可以在10~15秒内计算大型链的Java原型,但性能监控显示,我的运行时间大部分浪费在同步和ConcurrentHashMap的开销上。
如果有其他值得尝试的高并发语言,请告诉我。我了解Java、Python、Go、Ruby和Scala,但如果符合我的需求,我会学习任何语言。
同样,如果有巨大贝叶斯网络的开源实现,请提供建议。
英文:
I have a large dataset of philosophic arguments, each of which connect to other arguments as proof or disproof of a given statement. A root statement can have many proofs and disproofs, each of which may also have proofs and disproofs. Statements can also be used in multiple graphs, and graphs can be analyzed under a "given context" or assumption.
I need to construct a bayesian network of related arguments, so that each node propagates influence fairly and accurately to it's connected arguments; I need to be able to calculate the probability of chains of connected nodes concurrently, with each node requiring datastore lookups that must block to get results; the process is mostly I/O bound, and my datastore connection can run asynchronously in java, go and python {google appengine}. Once each lookup completes, it propagates the effects to all other connected nodes until the probability delta drops below a threshold of irrelevance {currently 0.1%}. Each node of the process must calculate chains of connections, then sum up all the results across all queries to adjust validity results, with results chained outward to any connected arguments.
In order to avoid recurring infinitely, I was thinking of using an A*-like process in goroutines to propagate updates to the argument maps, with a heuristic based on compounding influence which ignores nodes once probability of influence dips below, say 0.1% . I'd tried to set up the calculations with SQL triggers, but it got complex and messy way too fast. Then I moved to google appengine to take advantage of asynchronous nosql, and it was better, but still too slow. I need to be run the updates fast enough to get a snappy UI, so when a user creates or votes for or against a proof or disproof, they can see the results reflected in UI immediately.
I think Go is the language of choice to support the concurrency I need, but I'm open to suggestions. The client is a monolithic javascript app that just uses XHR and websockets to push and pull argument maps {and their updates} in real time. I have a java prototype that can compute large chains in 10~15s, but monitoring of performance shows that most of my runtime is wasted in synchronization and overhead from ConcurrentHashMap.
If there are other highly-concurrent languages worth trying out, please let me know. I know java, python, go, ruby and scala, but will learn any language if it suits my needs.
Similarly, if there are open source implementations of huge Bayesian networks, please leave a suggestion.
答案1
得分: 4
我认为很难确定你在问什么。也许你可以详细说明你的问题。
Goroutines非常廉价,非常适合现代Web应用程序,这些应用程序在大量使用XHR或Websockets(以及其他需要等待数据库响应等I/O绑定应用程序)时。此外,Go运行时还能够并行执行这些goroutine,因此Go也非常适合CPU绑定任务,这些任务应该利用多个核心和本地编译语言的速度。
但是你也应该记住,goroutine和channel并不是免费的。它们仍然需要一定量的内存,并且每个同步点(例如通道发送或接收)都有其成本。通常情况下,这不是一个问题,因为与数据库查询相比,同步非常廉价,但是如果每个goroutine /节点的实际工作与同步开销相比微不足道,那么它可能不适合构建高效的贝叶斯网络。
对于每个并发程序,你的主要目标应该是尽可能避免共享可变性。因此,使用goroutine和channel建模的贝叶斯网络可能是一个很好的教育示例,也是衡量Go通道实现性能的好方法,但它可能不是最适合你的问题的解决方案。
英文:
I think it's a bit difficult to tell what you are asking about. Maybe you can elaborate on your question.
Goroutines are quite cheap, and are a perfect match for modern web applications which use XHR or Websockets heavily (and other I/O bound applications which have to wait for database responses and stuff like that). Additionally, the go runtime is also able to execute those goroutines in parallel, so that Go is also a good fit for CPU bound tasks, which should take advantage of multiple cores and the speed of a natively compiled language.
But you should also keep in mind, that goroutines and channels aren't for free. They still require some amount of memory and each synchronization point (e.g. a channel send or receive) comes with its cost. That's normally not a problem, since the synchronization is, in comparison to a database query for example, extremely cheap, but it might not be suited for building efficient Bayesian networks, especially if the actual work of each goroutine / node is negligible in comparison to the synchronization overhead.
Your primary goal for every concurrent program should be to avoid shared mutability as far as possible. So a Bayesian network modeled with goroutines and channels might be a good educational example and a great way to measure the performance of Go's channel implementation, but it's probably not the best fit for your problem.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论