英文:
How does Spark in Java filter the values in the list in dataset?
问题
我有两个类,一个是NewsArticle:String id,String title,List
我想筛选出subtype值等于"paragraph"的内容,并将其拼接成一个长字符串(不需要url)。
以下是NewsArticle Dataset
的样子:
1, "TiTle", [{htt..., paragraph, rem...},{htt..., paragraph, rem...},{htt..., paragraph, rem...}]
其中包括id, title, List<ContentItem>
我提取出了contents列,每一行代表一篇文章,它的格式如下:
[{http..., others, con...},{http..., paragraph, rem...},{http..., paragraph, rem...}]
其中包括url, subtype, content
现在我想让每篇文章(行)看起来像:
1, "Title", "这是subtype等于paragraph的内容"
有谁能帮我用Java实现这个功能?
英文:
I have two class, one is NewsArticle: String id, String title, List<ContentItem> contents, the other is ContentItem: String content, String subtype, String url.
I want to filter out the content whose subtype value is equal to "paragraph", and spliced into one long string. (don't need url)
here is the NewsArticle Dataset
like:
1, "TiTle", [{htt..., paragraph, rem...},{htt..., paragraph, rem...},{htt..., paragraph, rem...}]
which is id, title, List<ContentItem>
I took out the contents column, and each single row is one article, it like this:
[{http..., others, con...},{http..., paragraph, rem...},{http..., paragraph, rem...}]
which is url, subtype, content
and now I want to make each article(row) look like:
1, "Title", "this is content which subtype equals paragraph"
can anyone help me with java?
答案1
得分: 1
这将起作用:
df
.withColumn("newContent", functions.explode(functions.col("items")))
.filter("newContent.subtype=='paragraph'")
.selectExpr("id", "title", "newContent.content as content")
.show(false);
输入:
+---+--------------------------------------------------------------------------------------------------------+-----+
|id |items |title|
+---+--------------------------------------------------------------------------------------------------------+-----+
|id |[[Content1, subtype1, someurl], [ContentOfParagraph, paragraph, someurl], [Content2, subtype2, someurl]]|Title|
+---+--------------------------------------------------------------------------------------------------------+-----+
输出:
+---+-----+------------------+
|id |title|content |
+---+-----+------------------+
|id |Title|ContentOfParagraph|
+---+-----+------------------+
英文:
This would work:
df
.withColumn("newContent", functions.explode(functions.col("items")))
.filter("newContent.subtype=='paragraph'")
.selectExpr("id", "title", "newContent.content as content")
.show(false);
Input:
+---+--------------------------------------------------------------------------------------------------------+-----+
|id |items |title|
+---+--------------------------------------------------------------------------------------------------------+-----+
|id |[[Content1, subtype1, someurl], [ContentOfParagraph, paragraph, someurl], [Content2, subtype2, someurl]]|Title|
+---+--------------------------------------------------------------------------------------------------------+-----+
Output:
+---+-----+------------------+
|id |title|content |
+---+-----+------------------+
|id |Title|ContentOfParagraph|
+---+-----+------------------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论