Is there a way to convert a String to a Java type using Jackson and/or one of its associated libraries (csv, json, etc.)

huangapple go评论76阅读模式
英文:

Is there a way to convert a String to a Java type using Jackson and/or one of its associated libraries (csv, json, etc.)

问题

以下是您提供的内容的翻译:

有没有一种机制可以应用一组标准检查来检测并将String转换为检测到的类型,使用Jackson的标准文本相关库之一(csv、json,甚至是jackson-core)?我可以想象将其与与该值关联的标签一起使用(例如CSV标题),以执行类似以下方式的操作:
JavaTypeAndValue typeAndValue = StringToJavaType.fromValue(Object x, String label);
typeAndValue.type() // Java类型的全限定名称,也许
typeAndValue.label() // 例如,标签可能是列标题值
typeAndValue.value() // 返回typeAndValue.type()的Object
需要一组“提取器”来应用转换,类的消费者必须意识到“Object”返回类型的“模糊性”,但仍然能够使用和使用信息,考虑到其目的。
我目前考虑的示例涉及使用从csv文件的一行中获取的List<JavaTypeAndValue>信息构建SQL DDL或DML,例如创建表语句。
经过更多的挖掘,希望找到类似下面的功能,我写了我想法的开头。
请记住,我在这里的意图不是呈现出“完整”的东西,因为我肯定会有一些遗漏的东西,未处理的边缘情况等。
`pasrse(List<Map<String, String>> rows, List<String> headers`来自于这可能是从Jackson读取的CSV文件的行的示例。
再次强调,这不是完整的,所以我不打算挑剔以下所有的问题。问题不是“我们该如何编写这个?”,而是“有没有人熟悉存在类似以下操作的东西?”。
import gms.labs.cassandra.sandbox.extractors.Extractor;
import gms.labs.cassandra.sandbox.extractors.Extractors;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.Accessors;
@Accessors(fluent=true, chain=true)
public class TypeAndValue
{
@Builder
TypeAndValue(Class<?> type, String rawValue){
this.type = type;
this.rawValue = rawValue;
label = "NONE";
}
@Getter
final Class<?> type;
@Getter
final String rawValue;
@Setter
@Getter
String label;
public Object value(){
return Extractors.extractorFor(this).value(rawValue);
}
static final String DEFAULT_LABEL = "NONE";
}
一个简单的解析器,其中“解析”来自于一个上下文,我有一个来自CSVReader的List<Map<String, String>>。
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.util.*;
import java.util.function.BiFunction;
public class JavaTypeParser
{
public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers)
{
List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
for (Map<String, String> row : rows) {
for (String header : headers) {
String val = row.get(header);
TypeAndValue typeAndValue =
//  isNull, isBoolean, isNumber
isNull(val).orElse(isBoolean(val).orElse(isNumber(val).orElse(_typeAndValue.apply(String.class, val).get())));
typesAndVals.add(typeAndValue.label(header));
}
}
}
public static Optional<TypeAndValue> isNumber(String val)
{
if (!NumberUtils.isCreatable(val)) {
return Optional.empty();
} else {
return _typeAndValue.apply(NumberUtils.createNumber(val).getClass(), val);
}
}
public static Optional<TypeAndValue> isBoolean(String val)
{
boolean bool = (val.equalsIgnoreCase("true") || val.equalsIgnoreCase("false"));
if (bool) {
return _typeAndValue.apply(Boolean.class, val);
} else {
return Optional.empty();
}
}
public static Optional<TypeAndValue> isNull(String val){
if(Objects.isNull(val) || val.equals("null")){
return _typeAndValue.apply(ObjectUtils.Null.class,val);
}
else{
return Optional.empty();
}
}
static final BiFunction<Class<?>, String, Optional<TypeAndValue>> _typeAndValue = (type, value) -> Optional.of(
TypeAndValue.builder().type(type).rawValue(value).build());
}
Extractors。这只是一个示例,说明值(包含在字符串中)的“提取器”可能在某个地方注册以进行查找。它们也可以以许多其他方式引用。
import gms.labs.cassandra.sandbox.TypeAndValue;
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;
public class Extractors
{
private static final List<Class> NUMS = Arrays.asList(
BigInteger.class,
BigDecimal.class,
Long.class,
Integer.class,
Double.class,
Float.class);
public static final Extractor<?> extractorFor(TypeAndValue typeAndValue)
{
if (NUMS.contains(typeAndValue.type())) {
return (Extractor<Number>) value -> NumberUtils.createNumber(value);
} else if(typeAndValue.type().equals(Boolean.class)) {
return  (Extractor<Boolean>) value -> Boolean.valueOf(value);
} else if(typeAndValue.type().equals(ObjectUtils.Null.class)) {
return  (Extractor<ObjectUtils.Null>) value -> null; // should we just return the raw value.  some frameworks coerce to null.
} else if(typeAndValue.type().equals(String.class)) {
return  (Extractor<String>) value -> typeAndValue.rawValue(); // just return the raw value.  some frameworks coerce to null.
}
else{
throw new RuntimeException("unsupported");
}
}
}
我从JavaTypeParser类中运行了这个,供参考。
public static void main(String[] args)
{
Optional<TypeAndValue> num = isNumber("-1230980980980980980980980980980988009808989080989809890808098292");
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());  // BigInteger
});
num = isNumber("-123098098097987");
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // Long
});
num = isNumber("-123098.098097987"); // Double
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal
<details>
<summary>英文:</summary>
Is there a mechanism to apply a standard set of checks to detect and then transform a String to the detected type, using one of Jackson&#39;s standard text related libs (csv, json, or even jackson-core)?  I can imagine using it along with a label associated with that value (CSV header, for example) to do something sorta like the following:
JavaTypeAndValue typeAndValue = StringToJavaType.fromValue(Object x, String label);  
typeAndValue.type() // FQN of Java type, maybe
typeAndValue.label() // where label might be a column header value, for example
typeAndValue.value() // returns Object  of typeAndValue.type()
A set of &#39;extractors&#39; would be required to apply the transform, and the consumer of the class would have to be aware of the &#39;ambiguity&#39; of the &#39;Object&#39; return type, but still capable of consuming and using the information, given its purpose.
The example I&#39;m currently thinking about involves constructing SQL DDL or DML, like a CREATE Table statement using the information from a List&lt;JavaTypeAndValue&gt; derived from evaluating a row from a csv file.
After more digging, hoping to find something out there, I wrote the start of what I had in mind.
Please keep in mind that my intention here isn&#39;t to present something &#39;complete&#39;, as I&#39;m sure there are several things missing here, edge cases not addressed, etc.
The `pasrse(List&lt;Map&lt;String, String&gt;&gt; rows, List&lt;String&gt; headers` comes from the idea that this could be a sample of rows from a CSV file read in from Jackson, for example.  
Again, this isn&#39;t complete, so I&#39;m not looking to pick at everything that&#39;s wrong with the following.  The question isn&#39;t &#39;how would we write this?&#39;, it&#39;s &#39;is anyone familiar with something that exists that does something like the following?&#39;.
import gms.labs.cassandra.sandbox.extractors.Extractor;
import gms.labs.cassandra.sandbox.extractors.Extractors;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.Accessors;
@Accessors(fluent=true, chain=true)
public class TypeAndValue
{
@Builder
TypeAndValue(Class&lt;?&gt; type, String rawValue){
this.type = type;
this.rawValue = rawValue;
label = &quot;NONE&quot;;
}
@Getter
final Class&lt;?&gt; type;
@Getter
final String rawValue;
@Setter
@Getter
String label;
public Object value(){
return Extractors.extractorFor(this).value(rawValue);
}
static final String DEFAULT_LABEL = &quot;NONE&quot;;
}
A simple parser, where the `parse` came from a context where I have a `List&lt;Map&lt;String,String&gt;&gt;` from a CSVReader.
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.util.*;
import java.util.function.BiFunction;
public class JavaTypeParser
{
public static final List&lt;TypeAndValue&gt; parse(List&lt;Map&lt;String, String&gt;&gt; rows, List&lt;String&gt; headers)
{
List&lt;TypeAndValue&gt; typesAndVals = new ArrayList&lt;TypeAndValue&gt;();
for (Map&lt;String, String&gt; row : rows) {
for (String header : headers) {
String val = row.get(header);
TypeAndValue typeAndValue =
//  isNull, isBoolean, isNumber
isNull(val).orElse(isBoolean(val).orElse(isNumber(val).orElse(_typeAndValue.apply(String.class, val).get())));
typesAndVals.add(typeAndValue.label(header));
}
}
}
public static Optional&lt;TypeAndValue&gt; isNumber(String val)
{
if (!NumberUtils.isCreatable(val)) {
return Optional.empty();
} else {
return _typeAndValue.apply(NumberUtils.createNumber(val).getClass(), val);
}
}
public static Optional&lt;TypeAndValue&gt; isBoolean(String val)
{
boolean bool = (val.equalsIgnoreCase(&quot;true&quot;) || val.equalsIgnoreCase(&quot;false&quot;));
if (bool) {
return _typeAndValue.apply(Boolean.class, val);
} else {
return Optional.empty();
}
}
public static Optional&lt;TypeAndValue&gt; isNull(String val){
if(Objects.isNull(val) || val.equals(&quot;null&quot;)){
return _typeAndValue.apply(ObjectUtils.Null.class,val);
}
else{
return Optional.empty();
}
}
static final BiFunction&lt;Class&lt;?&gt;, String, Optional&lt;TypeAndValue&gt;&gt; _typeAndValue = (type, value) -&gt; Optional.of(
TypeAndValue.builder().type(type).rawValue(value).build());
}
Extractors.  Just an example of how the &#39;extractors&#39; for the values (contained in strings) might be registered somewhere for lookup.  They could be referenced any number of other ways, too.
import gms.labs.cassandra.sandbox.TypeAndValue;
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;
public class Extractors
{
private static final List&lt;Class&gt; NUMS = Arrays.asList(
BigInteger.class,
BigDecimal.class,
Long.class,
Integer.class,
Double.class,
Float.class);
public static final Extractor&lt;?&gt; extractorFor(TypeAndValue typeAndValue)
{
if (NUMS.contains(typeAndValue.type())) {
return (Extractor&lt;Number&gt;) value -&gt; NumberUtils.createNumber(value);
} else if(typeAndValue.type().equals(Boolean.class)) {
return  (Extractor&lt;Boolean&gt;) value -&gt; Boolean.valueOf(value);
} else if(typeAndValue.type().equals(ObjectUtils.Null.class)) {
return  (Extractor&lt;ObjectUtils.Null&gt;) value -&gt; null; // should we just return the raw value.  some frameworks coerce to null.
} else if(typeAndValue.type().equals(String.class)) {
return  (Extractor&lt;String&gt;) value -&gt; typeAndValue.rawValue(); // just return the raw value.  some frameworks coerce to null.
}
else{
throw new RuntimeException(&quot;unsupported&quot;);
}
}
}
I ran this from within the JavaTypeParser class, for reference. 
public static void main(String[] args)
{
Optional&lt;TypeAndValue&gt; num = isNumber(&quot;-1230980980980980980980980980980988009808989080989809890808098292&quot;);
num.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());  // BigInteger
});
num = isNumber(&quot;-123098098097987&quot;);
num.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // Long
});
num = isNumber(&quot;-123098.098097987&quot;); // Double
num.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());
});
num = isNumber(&quot;-123009809890898.0980979098098908080987&quot;); // BigDecimal
num.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());
});
Optional&lt;TypeAndValue&gt; bool = isBoolean(&quot;FaLse&quot;);
bool.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // Boolean
});
Optional&lt;TypeAndValue&gt; nulll = isNull(&quot;null&quot;);
nulll.ifPresent(typeAndVal -&gt; {
System.out.println(typeAndVal.value());
//System.out.println(typeAndVal.value().getClass());  would throw null pointer exception
System.out.println(typeAndVal.type()); // ObjectUtils.Null (from apache commons lang3)
});
}
</details>
# 答案1
**得分**: 3
我不知道有任何库可以做到这一点,也从未见过在可能类型的开放集上以这种方式工作的任何内容。
对于一组已知类型(你知道所有可能的输出类型),更简单的方法是在字符串中编写类的完整限定名(根据你的描述,我没有得知你是否控制写入的字符串)。
完整的限定名(或其别名)。
否则,我认为没有逃脱不编写所有检查的办法。
此外,这将非常微妙,因为我正在考虑边缘用例。
假设你将json用作字符串中的序列化格式,你如何区分像`Hello World`这样的`String`值和以某种ISO格式(例如`2020-09-22`)编写的`Date`?为了做到这一点,你需要在进行检查时引入一些优先级(首先尝试使用一些正则表达式检查是否为日期,如果不是,再继续下一个,简单的字符串检查将是最后一个)。
如果你有两个对象:
```java
class User {
String name;
String surname;
}
class Employee {
String name;
String surname;
Integer salary;
}

并且你收到了第二种类型的序列化值,但薪水为空(null或者属性完全缺失)。

你如何区分集合和列表?

我不知道你的意图是否如此动态,或者你已经知道所有可能反序列化的类型,也许问题中的一些更多细节可以帮助。

更新

刚刚看到了代码,现在似乎更清楚了。
如果你知道所有可能的输出,那就是方法。
我唯一会做的更改是为了简化你想要管理的类型的增加,抽象出提取过程。
为了做到这一点,我认为应该做出一个小改变,比如:

interface Extractor {
    Boolean match(String value);
    Object extract(String value);
}

然后你可以为每种类型定义一个提取器:

class NumberExtractor implements Extractor<T> {
    public Boolean match(String val) {
        return NumberUtils.isCreatable(val);
    }
    public Object extract(String value) {
        return NumberUtils.createNumber(value);
    }
}
class StringExtractor implements Extractor {
    public Boolean match(String s) {
        return true; //<-- catch all
    }
    public Object extract(String value) {
        return value;
    }
}

然后注册和自动化检查:

public class JavaTypeParser {
  private static final List<Extractor> EXTRACTORS = List.of(
      new NullExtractor(),
      new BooleanExtractor(),
      new NumberExtractor(),
      new StringExtractor()
  )

  public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers) {
    List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
    for (Map<String, String> row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            
            typesAndVals.add(extract(header, val));
        }
    }
}
  public static final TypeAndValue extract(String header, String value) {
       for (Extractor<?> e : EXTRACTOR) {
           if (e.match(value)) {
               Object v = extractor.extract(value);
               return TypeAndValue.builder()
                         .label(header)
                         .value(v) //<-- you can put the real value here, and remove the type field
                         .build()
           }
       }
       throw new IllegalStateException("Can't find an extractor for: " + header + " | " + value);

  }

要解析CSV,我建议使用 https://commons.apache.org/proper/commons-csv,因为CSV解析可能会遇到讨厌的问题。

英文:

I don't know of any library to do this, and never seen anything working in this way on an open set of possible types.

For closed set of types (you know all the possible output types) the easier way would be to have the class FQN written in the string (from your description I didn't get if you are in control of the written string).
The complete FQN, or an alias to it.

Otherwise I think there is no escape to not write all the checks.

Furthermore it will be very delicate as I'm thinking of edge use case.

Suppose you use json as serialization format in the string, how would you differentiate between a String value like Hello World and a Date written in some ISO format (eg. 2020-09-22). To do it you would need to introduce some priority in the checks you do (first try to check if it is a date using some regex, if not go with the next and the simple string one be the last one)

What if you have two objects:

   String name;
String surname;
}
class Employee {
String name;
String surname;
Integer salary
}

And you receive a serialization value of the second type, but with a null salary (null or the property missing completely).

How can you tell the difference between a set or a list?

I don't know if what you intended is so dynamic, or you already know all the possible deserializable types, maybe some more details in the question can help.

UPDATE

Just saw the code, now it seems more clear.
If you know all the possible output, that is the way.
The only changes I would do, would be to ease the increase of types you want to manage abstracting the extraction process.
To do this I think a small change should be done, like:

interface Extractor {
    Boolean match(String value);
    Object extract(String value);
}

Then you can define an extractor per type:

class NumberExtractor implements Extractor&lt;T&gt; {
    public Boolean match(String val) {
        return NumberUtils.isCreatable(val);
    }
    public Object extract(String value) {
        return NumberUtils.createNumber(value);
    }
}
class StringExtractor implements Extractor {
    public Boolean match(String s) {
        return true; //&lt;-- catch all
    }
    public Object extract(String value) {
        return value;
    }
}

And then register and automatize the checks:

public class JavaTypeParser {
  private static final List&lt;Extractor&gt; EXTRACTORS = List.of(
      new NullExtractor(),
      new BooleanExtractor(),
      new NumberExtractor(),
      new StringExtractor()
  )

  public static final List&lt;TypeAndValue&gt; parse(List&lt;Map&lt;String, String&gt;&gt; rows, List&lt;String&gt; headers) {
    List&lt;TypeAndValue&gt; typesAndVals = new ArrayList&lt;TypeAndValue&gt;();
    for (Map&lt;String, String&gt; row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            
            typesAndVals.add(extract(header, val));
        }
    }
}
  public static final TypeAndValue extract(String header, String value) {
       for (Extractor&lt;?&gt; e : EXTRACTOR) {
           if (e.match(value) {
               Object v = extractor.extract(value);
               return TypeAndValue.builder()
                         .label(header)
                         .value(v) //&lt;-- you can put the real value here, and remove the type field
                         .build()
           }
       }
       throw new IllegalStateException(&quot;Can&#39;t find an extractor for: &quot; + header + &quot; | &quot; + value);

  }

To parse CSV I would suggest https://commons.apache.org/proper/commons-csv as CSV parsing can incur in nasty issues.

答案2

得分: 2

你实际上正在尝试编写一个解析器。你将一个片段转换成解析树。解析树捕捉了类型以及值。对于像数组和对象这样的层次类型,每个树节点包含子节点。

最常用的解析器之一(虽然对于你的用例有点过于复杂)是Antlr。Antlr 对Json提供了开箱即用的支持。

我建议花些时间来理解所有涉及的概念。尽管最初可能看起来有些过度,但在进行任何类型的扩展时,它很快就会产生回报。更改语法相对容易;生成的代码非常复杂。此外,所有解析器生成器都会验证您的语法以显示逻辑错误。

当然,如果您仅限于解析 CSV 或 JSON(而不是同时解析两者),您应该使用现有库的解析器。例如,Jackson 具有ObjectMapper.readTree来获取解析树。您还可以使用ObjectMapper.readValue(&lt;fragment&gt;, Object.class)来简单地获取规范的 Java 类。

英文:

What you actually trying to do is to write a parser. You translate a fragment into a parse tree. The parse tree captures the type as well as the value. For hierarchical types like arrays and objects, each tree node contains child nodes.

One of the most commonly used parsers (albeit a bit overkill for your use case) is Antlr. Antlr brings out-of-the-box support for Json.

I recommend to take the time to ingest all the involved concepts. Even though it might seem overkill initially, it quickly pays off when you do any kind of extension. Changing a grammar is relatively easy; the generated code is quite complex. Additionally, all parser generator verify your grammars to show logic errors.

Of course, if you are limiting yourself to just parsing CSV or JSON (and not both at the same time), you should rather take the parser of an existing library. For example, jackson has ObjectMapper.readTree to get the parse tree. You could also use ObjectMapper.readValue(&lt;fragment&gt;, Object.class) to simply get the canonical java classes.

答案3

得分: 0

尝试一下

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

String j = // json字符串;

JsonFactory jsonFactory = new JsonFactory();
ObjectMapper jsonMapper = new ObjectMapper(jsonFactory);
JsonNode jsonRootNode = jsonMapper.readTree(j);
Iterator<Map.Entry<String, JsonNode>> jsonIterator = jsonRootNode.fields();

while (jsonIterator.hasNext()) {
Map.Entry<String, JsonNode> jsonField = jsonIterator.next();
String k = jsonField.getKey();
String v = jsonField.getValue().toString();
...
}


<details>
<summary>英文:</summary>
Try this :

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

String j = // json string;

        JsonFactory jsonFactory = new JsonFactory();
ObjectMapper jsonMapper = new ObjectMapper(jsonFactory);
JsonNode jsonRootNode = jsonMapper.readTree(j);
Iterator&lt;Map.Entry&lt;String,JsonNode&gt;&gt; jsonIterator = jsonRootNode.fields();
while (jsonIterator.hasNext()) {
Map.Entry&lt;String,JsonNode&gt; jsonField = jsonIterator.next();
String k = jsonField.getKey();
String v = jsonField.getValue().toString();
...
}

</details>

huangapple
  • 本文由 发表于 2020年9月6日 02:48:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/63757433.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定