问题

I'm using the Spark OrientDB connector to retrieve some data that looks like the following:

character	title
Tony Stark	["Iron Man"]
James Buchanan Barnes	["Captain America: The First Avenger","Captain America: The Winter Soldier","Captain America: Civil War","Avengers: Infinity War"]
Marcus Bledsoe	["Captain America: The Winter Soldier"]

The Dataframe returns this as [character: string, title: embeddedlist]. An EmbeddedList is a UDT defined here

I would like to treat the title as an Array<String> so that I can do the following:

val vertices = df
  .select(explode(concat(array('character), 'title)) as "x")
  .distinct.rdd.map(_.getAs[String](0))
  .zipWithIndex.map(_.swap)

I'm not sure how to cast/convert the EmbeddedList correctly. Running this as-is results in the error: cannot resolve 'concat(array(name), out)' due to data type mismatch: input to function concat should have been string, binary or array, but it's [array<string>, array<string>]

Any help/pointers are appreciated.

Edit: The way I'm receiving the data is in this structure:

val df: DataFrame = Seq(
  "Tony Stark" -> EmbeddedList(Array("Iron Man")),
  "James Buchanan Barnes" -> EmbeddedList(Array("Captain America: The First Avenger", "Captain America: The Winter Soldier", "Captain America: Civil War", "Avengers: Infinity War")),
  "Marcus Bledsoe" -> EmbeddedList(Array("Captain America: The Winter Soldier"))
).toDF("character", "title")

英文:

I'm using the Spark OrientDB connector to retrieve some data that looks like the following:

character	title
Tony Stark	["Iron Man"]
James Buchanan Barnes	["Captain America: The First Avenger","Captain America: The Winter Soldier","Captain America: Civil War","Avengers: Infinity War"]
Marcus Bledsoe	["Captain America: The Winter Soldier"]

The Dataframe returns this as [character: string, title: embeddedlist]. An EmbeddedList is a UDT defined here

I would like to treat the title as an Array<String> so that I can do the following:

    val vertices = df
      .select(explode(concat(array(&#39;character), &#39;title)) as &quot;x&quot;)
      .distinct.rdd.map(_.getAs[String](0))
      .zipWithIndex.map(_.swap)

Any help/pointers are appreciated.

Edit: The way I'm receiving the data is in this structure:

    val df: DataFrame = Seq(
      &quot;Tony Stark&quot; -&gt; EmbeddedList(Array(&quot;Iron Man&quot;)),
      &quot;James Buchanan Barnes&quot; -&gt; EmbeddedList(Array(&quot;Captain America: The First Avenger&quot;, &quot;Captain America: The Winter Soldier&quot;, &quot;Captain America: Civil War&quot;, &quot;Avengers: Infinity War&quot;)),
      &quot;Marcus Bledsoe&quot; -&gt; EmbeddedList(Array(&quot;Captain America: The Winter Soldier&quot;))
    ).toDF(&quot;character&quot;, &quot;title&quot;)

答案1

得分: 0

我能够通过将 EmbeddedList 转换为字符串，然后按逗号拆分来解决这个问题。我必须相信有一种更加优雅的方法来做这个，但至少目前这个方法有效。

val vertices = df
  .withColumn("title", col("title").cast("String"))
  .withColumn("title", split(col("title"), ", "))
  .select(explode(concat(array('character), 'title)) as "x")
  .distinct.rdd.map(_.getAs[String](0))
  .zipWithIndex.map(_.swap)

英文:

I was able to solve this by casting the EmbeddedList to a string and then splitting on the comma. I have to believe there's a more elegant way to do this but this works for now at least.

    val vertices = df
      .withColumn(&quot;title&quot;, col(&quot;title&quot;).cast(&quot;String&quot;))
      .withColumn(&quot;title&quot;, split(col(&quot;title&quot;), &quot;, &quot;))
      .select(explode(concat(array(&#39;character), &#39;title)) as &quot;x&quot;)
      .distinct.rdd.map(_.getAs[String](0))
      .zipWithIndex.map(_.swap)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将Spark Dataset列从UDT转换为Array。

问题

答案1

Scala基于Option变量分配值

使用Java语言在Spark中读取一个二进制列。

Kafka Scala消费者未从主题中读取消息，控制台没有错误。

递归 GO vs Scala

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论