解析 JSON 并在子列名设置为 true 时获取父列名。

huangapple go评论77阅读模式
英文:

SPARK: Parse JSON and get the Parent Column name when child column name is set to true

问题

unique_id | ignore |

[student_id] | [phone_number, e_mail] |

英文:

Given a below data to pass:

{
    
    "properties": {
        "student_id": {
            "type": "string",
            "unique_id": true
        },
        "status": {
            "type": "boolean"
        },
        "name": {
            "type": "string"
        },
        "phone_number": {
            "type": "string",
            "ignore": true
        },
        "e_mail": {
            "type": "string",
            "ignore": true
        },
        "address": {
            "type": "string"
        }
    },
    "subjects": [
        "science",
        "english"
    ]
}

I am specifically looking for child columns "unique_id" and "ignore" and want to get the below output:

unique_id     | ignore                 |
----------------------------------------
[student_id]  | [phone_number, e_mail] |

There parent column names are always changing.
The child columns like "unique_id" and "ignore" can be present for any of the parent columns.
I would like to know the parent column name for which the "unique_id" and "ignore" is true.

答案1

得分: 1

我使用json4s进行解析类似这样的代码可以工作

**导入和原始JSON**

```scala
import org.json4s.DefaultFormats
import org.json4s.{ JValue, JObject, JString, JBool, JField }
import org.json4s.jackson.JsonMethods.parse

val jsonString = "{\"properties\":{\"student_id\":{\"type\":\"string\",\"unique_id\":true},\"status\":{\"type\":\"boolean\"},\"name\":{\"type\":\"string\"},\"phone_number\":{\"type\":\"string\",\"ignore\":true},\"e_mail\":{\"type\":\"string\",\"ignore\":true},\"address\":{\"type\":\"string\"}},\"subjects\":[\"science\",\"english\"]}"

提取属性的函数,其中 key == true

/** @param key 当给定属性中存在 (key, true) 时,提取属性名称
  * @param json 原始的JSON数据字符串
  */
def extractProperties(key: String, json: String): List[String] = {
  // 解析字符串 => JValue
  implicit val formats = DefaultFormats
  val jval: JValue = parse(json)

  // 从JSON中提取 "properties"
  val properties: List[JField] = jval match {
    case JObject(obj) => obj.filter(_._1 == "properties").head._2 match {
      case JObject(obj2) => obj2
      case _ => throw new IllegalArgumentException
    }
    case _ => throw new IllegalArgumentException
  }

 // 提取属性名称,其中 key == true
 properties
   .filter(property => property._2.asInstanceOf[JObject].obj contains (key,JBool(true)))
   .map(_._1)
}

测试它

val unique: List[String] = extractProperties("unique_id", jsonString)
// unique: List[String] = List(student_id)

val ignored: List[String] = extractProperties("ignore", jsonString)
// ignored: List[String] = List(phone_number, e_mail)
英文:

I use json4s to parse, and something like this works.

imports & raw JSON

import org.json4s.DefaultFormats
import org.json4s.{ JValue, JObject, JString, JBool, JField }
import org.json4s.jackson.JsonMethods.parse

val jsonString = "{\"properties\":{\"student_id\":{\"type\":\"string\",\"unique_id\":true},\"status\":{\"type\":\"boolean\"},\"name\":{\"type\":\"string\"},\"phone_number\":{\"type\":\"string\",\"ignore\":true},\"e_mail\":{\"type\":\"string\",\"ignore\":true},\"address\":{\"type\":\"string\"}},\"subjects\":[\"science\",\"english\"]}"

function to extract properties where key == true

/** @param key where (key, true) exists for given properties, extract the property names
  * @param json raw String of JSON data
  */
def extractProperties(key: String, json: String): List[String] = {
  // parse String => JValue
  implicit val formats = DefaultFormats
  val jval: JValue = parse(json)

  // extract "properties" from JSON
  val properties: List[JField] = jval match {
    case JObject(obj) => obj.filter(_._1 == "properties").head._2 match {
      case JObject(obj2) => obj2
      case _ => throw new IllegalArgumentException
    }
    case _ => throw new IllegalArgumentException
  }

 // extract property names where key == true
 properties
   .filter(property => property._2.asInstanceOf[JObject].obj contains (key,JBool(true)))
   .map(_._1)
}

test it out

val unique: List[String] = extractProperties("unique_id", jsonString)
// unique: List[String] = List(student_id)

val ignored: List[String] = extractProperties("ignore", jsonString)
// ignored: List[String] = List(phone_number, e_mail)

huangapple
  • 本文由 发表于 2023年6月30日 04:52:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584549.html
  • apache-spark
  • json
  • scala
  • user-defined-functions
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定