问题

我想使用查找操作来查询MongoDB，而不是加载整个集合，然后应用于PySpark过滤器，这在大多数文档中都有提到。有没有办法做到这一点？

我正在寻找一种查询MongoDB的方法，而不是将整个Mongo集合加载到PySpark中。

英文:

I want to query mongo db using a find operation instead of loading the entire collection and then applying pyspark filters which is mentioned in most of the documentation. Is there any way to do this?

I am looking for something to query mongo, rather than loading the entire mongo collection into pyspark

答案1

得分: 1

这可以通过使用"pipeline"选项来完成：

df = spark.read \
    .format("com.mongodb.spark.sql.DefaultSource") \
    .option("uri", "mongodb://<主机>:<端口>/<数据库>.<集合>") \
    .option("aggregation.pipeline", "[{'$match': {<查询>}}]") \
    .load()

英文:

This can be done using the option of "pipeline"

df = spark.read \
    .format(&quot;com.mongodb.spark.sql.DefaultSource&quot;) \
    .option(&quot;uri&quot;, &quot;mongodb://&lt;host&gt;:&lt;port&gt;/&lt;database&gt;.&lt;collection&gt;&quot;) \
    .option(&quot;aggregation.pipeline&quot;, &quot;[{&#39;$match&#39;: {&lt;query&gt;}}]&quot;) \
    .load()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pyspark中应用Mongo的查找查询。

问题

答案1

MongoDB Atlas. 无法创建具有内置角色 “atlasAdmin” 的角色。

在”load()”处出现cassandra-connector问题。

如何使用Java应用程序将Parquet数据集转换为Delta。

实体解析 – 基于3列创建唯一标识符

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论