英文:
How can I read files from a MapR cluster using Go?
问题
我有一个在Kubernetes集群中运行的Go应用程序,需要从一个大型的MapR集群中读取文件。这两个集群是分开的,Kubernetes集群不允许我们使用CSI驱动程序。我所能做的只是在Kubernetes的pod中运行Docker容器中的用户空间应用程序,并且我被给予了maprticket
来连接到MapR集群。
我能够使用com.mapr.hadoop
maprfs
jar编写一个Java应用程序,能够使用maprticket
连接和读取文件,但我们需要将其集成到一个Go应用程序中,理想情况下,不应该需要一个Java sidecar进程。
英文:
I have a Go application running in a Kubernetes cluster which needs to read files from a large MapR cluster. The two clusters are separate and the Kubernetes cluster does not permit us to use the CSI driver. All I can do is run userspace apps in Docker containers inside Kubernetes pods and I am given maprticket
s to connect to the MapR cluster.
I'm able to use the com.mapr.hadoop
maprfs
jar to write a Java app which is able to connect and read files using a maprticket
, but we need to integrate this into a Go app, which, ideally, shouldn't require a Java sidecar process.
答案1
得分: 3
这是一个很好的问题,因为它突显了一些环境会违反外部软件可能持有的假设的限制。
仅供参考,MapR已被HPE收购,所以MapR集群现在是HPE Ezmeral Data Fabric集群。我还在训练自己说这个名字。
无论如何,在语言X中,与Ezmeral Data Fabric(以前称为MapR FS的文件系统)通信的通用程序的接受方法是挂载文件系统,然后使用诸如open/read/write等文件API与其通信。这适用于Go、Python、C、Julia或其他语言。在Kubernetes中,执行此挂载的常规方法是使用具有某种后台运行的CSI驱动程序。该运算符并不特别神奇...它只是做必要的事情。在数据织物的情况下,运算符使用NFS或FUSE挂载数据织物,然后将其部分绑定到Pod的感知中。
但是这个问题很酷,因为它排除了所有这些。如果您无法安装运算符,那么这些其他内容就无法使用。
有三种可能有效的替代方法。
1)在CSI插件方法标准化之前,NFS挂载作为Kubernetes的本机功能被包含在内。在非常基本的Kubernetes集群上可能仍然可以使用它,这可以让您访问数据集群。
2)可以将一个容器集成到您的Pod中,以非特权方式执行必要的FUSE挂载。这可能会有些麻烦,因为您需要将FUSE驱动程序与数据织物安装分开,并使其正常工作。这将使您能够在Pod内部看到数据织物。即便如此,Kubernetes或操作系统也不能保证允许此操作。
3)有一个未发布的Go文件系统客户端,直接使用低级别的数据织物API。我们尚未单独发布它。有关更多信息,请直接与我联系(我的联系方式随处可见...发送电子邮件至ted.dunning <at> hpe.com或<at> gmail.com)。
4)数据织物允许您通过S3访问数据。在Ezmeral Data Fabric的7.0版本中,这个功能进行了大幅改进,特别是因为您可以无限制地扩展网关的数量(我听说每个无状态连接到网关的速度可以达到3-5GB/s,但可能因情况而异)。这将需要最少的调整,并且应该提供足够的性能。您甚至可以像访问S3对象一样访问文件。
英文:
This is a good question because it highlights the way that some environments impose limits that violate the assumptions external software may hold.
And just for reference, MapR was acquired by HPE so a MapR cluster is now an HPE Ezmeral Data Fabric cluster. I am still training myself to say that.
Anyway, the accepted method for a generic program in language X to communicate with the Ezmeral Data Fabric (the filesystem formerly known as MapR FS) is to mount the file system and just talk to it using file APIs like open/read/write and such. This applies to Go, Python, C, Julia or whatever. Inside Kubernetes, the normal way to do this mount is to use a CSI driver that has some kind of operator working in the background. That operator isn't particularly magical ... it just does what is needful. In the case of data fabric, the operator mounts the data fabric using NFS or FUSE and then bind mounts[1] part of that into the pod's awareness.
But this question is cool because it precludes all of that. If you can't install an operator, then this other stuff is just a dead letter.
There are three alternative approaches that may work.
-
NFS mounts were included in Kubernetes as a native capability before the CSI plugin approach was standardized. It might still be possible to use that on a very vanilla Kubernetes cluster and that could give access to the data cluster.
-
It is possible to integrate a container into your pod that does the necessary FUSE mount in an unprivileged way. This will be kind of painful because you would have to tease apart the FUSE driver from the data fabric install and get it to work. That would let you see the data fabric inside the pod. Even then, there is no guarantee Kubernetes or the OS will allow this to work.
-
There is an unpublished Go file system client that users the low level data fabric API directly. We don't yet release that separately. For more information on that, folks should ping me directly (my contact info is everywhere ... email to ted.dunning <at> hpe.com or <at> gmail.com works)
-
The data fabric allows you to access data via S3. With the 7.0 release of Ezmeral Data Fabric, this capability is heavily revamped to give massive performance especially since you can scale up the number of gateways essentially without limit (I have heard numbers like 3-5GB/s per stateless connection to a gateway, but YMMV). This will require the least futzing and should give plenty of performance. You can even access files as if they were S3 objects.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论