DelimitedLineTokenizer中的分隔符之外的特殊字符

huangapple go评论44阅读模式
英文:

Special characters other than delimiter in DelimitedLineTokenizer of FlatFileItemReader

问题

在这种情况下,是否有一种方式可以原样导入特殊字符,比如"和@?

英文:

I am trying to read unl file in Spring batch.
Use FlatFileItemReader and delimiter is "|".

001-A472468827" |N|100| The delimiter does not work when encountering this data.
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character.
quoteCharacter doesn't seem to work.
In this situation, is there a way to import special characters such as " and @ as they are?

@Bean
  @StepScope
  public FlatFileItemReader unlFileReader() throws MalformedURLException {
    return new FlatFileItemReaderBuilder<ExampleDTO>()
        .name("unlFileReader")
        /*.encoding(StandardCharsets.UTF_8.name())*/
        .resource(fileService.inputFileResource(UNZIP_PATH + "example.unl"))
        .fieldSetMapper(new BeanWrapperFieldSetMapper<>())
        .targetType(ExampleDTO.class)
        .delimited().delimiter("|")
        .quoteCharacter('@')
        .quoteCharacter('"')
        .quoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER)
        .includedFields(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141
        )
        .names(ExampleDTO.getFieldNameArrays())
        .build();
  }

In this situation, is there a way to import special characters such as " and @ as they are?

答案1

得分: 0

您多次调用了quoteCharacter()函数,请注意,这会覆盖先前的值,并不会将引号字符添加到引号字符列表中。只会使用一个引号字符(如果连续多次调用,则使用最后一个添加的引号字符)。

如果数据包含"和空格,或者包含@字符,分隔符将无法分割数据。

这是因为"是默认的引号字符。如果输入包含单个",则需要指定另一个分隔符(否则Spring Batch将其视为数据中的"bug`,这是正确的,因为该字段没有正确引用)。以下是一个通过的快速测试示例:

@Test
void testPipeDelimiter() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setDelimiter("|");
    tokenizer.setQuoteCharacter(' ');
    String s = "001-A472468827|N|100|";
    FieldSet fieldSet = tokenizer.tokenize(s);
    Assertions.assertEquals("001-A472468827", fieldSet.readString(0));
    Assertions.assertEquals("N", fieldSet.readString(1));
    Assertions        .assertEquals("100", fieldSet.readString(2));
}

这个测试显示"是第一个字段的一部分。相同的测试也可以通过输入中包含@来通过:

@Test
void testPipeDelimiter() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setDelimiter("|");
    tokenizer.setQuoteCharacter(' ');
    String s = "001-A472468827@|N|100|";
    FieldSet fieldSet = tokenizer.tokenize(s);
    Assertions.assertEquals("001-A472468827@", fieldSet.readString(0));
    Assertions.assertEquals("N", fieldSet.readString(1));
    Assertions.assertEquals("100", fieldSet.readString(2));
}
英文:

You are calling quoteCharacter() several times, note that this overrides the previous value and does not add the quote character to a list of quote characters. Only one quote character will be used (the last one added if you chain such calls).

> Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character

This is because " is the default quote character. If the input contains a single ", you need to specify another delimiter (otherwise Spring Batch considers that as a "bug" in your data, which is true as the field is not correctly quoted). Here is a quick test that passes:

@Test
void testPipeDelimiter() {
	DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
	tokenizer.setDelimiter("|");
	tokenizer.setQuoteCharacter(' ');
	String s = "001-A472468827\"|N|100|";
	FieldSet fieldSet = tokenizer.tokenize(s);
	Assertions.assertEquals("001-A472468827\"", fieldSet.readString(0));
	Assertions.assertEquals("N", fieldSet.readString(1));
	Assertions.assertEquals("100", fieldSet.readString(2));
}

This test shows that the " is part of the first field. The same test passes with a @ in the input:

@Test
void testPipeDelimiter() {
	DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
	tokenizer.setDelimiter("|");
	tokenizer.setQuoteCharacter(' ');
	String s = "001-A472468827@|N|100|";
	FieldSet fieldSet = tokenizer.tokenize(s);
	Assertions.assertEquals("001-A472468827@", fieldSet.readString(0));
	Assertions.assertEquals("N", fieldSet.readString(1));
	Assertions.assertEquals("100", fieldSet.readString(2));
}

huangapple
  • 本文由 发表于 2023年2月16日 10:50:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75467359.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定