2020年4月6日 19:26:49go评论181阅读模式

英文:

Get a single column values as a flat list in Apache spark using java

问题

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.functions;

// Create a Spark session
SparkSession sparkSession = SparkSession.builder()
    .appName("ColumnValuesExample")
    .master("local[*]")  // Use appropriate master URL for your environment
    .getOrCreate();

// Read data from a source and create a Dataset
Dataset<Row> sampleData = sparkSession.read()
    // ... other options
    .option("query", "SELECT COLUMN1, column2 from table1")
    .load();

// Select the desired column and collect values
List<String> columnValuesList = sampleData
    .select("COLUMN1")
    .where(sampleData.col("COLUMN1").isNotNull())
    .as(Encoders.STRING())  // Cast the column to String type
    .collectAsList();

String result = StringUtils.join(columnValuesList, ", ");
// Result will be the desired comma-separated string of values

Please note that you need to import the necessary packages and make sure you have set up your Spark session correctly with the appropriate master URL and other configurations. The key part in achieving your desired result is using the as(Encoders.STRING()) method to cast the column values to strings and then using collectAsList() to gather the values into a list.

英文:

I am new to Java and Apache spark and trying to figure out how to get values of a single column from a Dataset in spark as a flat list.

Dataset&lt;Row&gt; sampleData = sparkSession.read()
                          .....
                          .option(&quot;query&quot;, &quot;SELECT COLUMN1, column2 from table1&quot;)
                          .load();

List&lt;Row&gt; columnsList = sampleData.select(&quot;COLUMN1&quot;)
    .where(sampleData.col(&quot;COLUMN1&quot;).isNotNull()).collectAsList();

String result = StringUtils.join(columnsList, &quot;, &quot;);
// Result I am getting is
[15230321], [15306791], [15325784], [15323326], [15288338], [15322001], [15307950], [15298286], [15327223]
// What i want is&quot;:
15230321, 15306791......

How do I achieve this in spark using java?

答案1

得分: 1

Spark行可以通过编码器转换为字符串：

List<String> result = sampleData.select("COLUMN1").as(Encoders.STRING()).collectAsList();

请注意，由于您要求只返回翻译好的代码部分，我已省略了任何额外的回答或解释。如果您有任何其他需要或问题，欢迎随时提问。

英文:

Spark row can be converted to String by Encoders:

    List&lt;String&gt; result = sampleData.select(&quot;COLUMN1&quot;).as(Encoders.STRING()).collectAsList();

答案2

得分: 1

我将答案粘贴在Scala中。您可以将其转换为Java，因为有可用的在线工具。

另外，我不会像您指定的方式创建String result，因为那需要创建表格并根据您的过程执行查询，但我直接使用以下方式复制问题变量

import org.apache.spark.sql.Row    
val a = List(Row("123"),Row("222"),Row("333"))

打印a会得到

List([123], [222], [333])

因此，使用简单的映射操作以及mkString方法来展开List

a.map(x => x.mkString(","))

会得到

List(123, 222, 333)

我认为这符合您的期望。如果这解决了您的问题，请告诉我。

英文:

I am pasting the answer in Scala. You can convert it into Java as there are online tools available.

Also I am not creating String result as the way you specified because it would require creating table and doing the query per your process but I am replicating the problem variable directly using

import org.apache.spark.sql.Row    
val a = List(Row(&quot;123&quot;),Row(&quot;222&quot;),Row(&quot;333&quot;))

Printing a is giving me

List([123], [222], [333])

So apply a simple map operation along with mkString method to flatten the List

 a.map(x =&gt; x.mkString(&quot;,&quot;))

gives

List(123, 222, 333) which I assume is your expectation.

Let me know if this sorts out your issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取Apache Spark中单列的值，以Java编写，作为一个扁平列表。

问题

答案1

答案2

使用Java在PostgreSQL中执行Upsert操作时返回值出现错误。

Java迭代双向链表序列。

显示RecyclerView仅在提交搜索后。

如何在Java中比较相同方法的输入

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论