2023年2月16日 10:50:48go评论49阅读模式

英文:

Special characters other than delimiter in DelimitedLineTokenizer of FlatFileItemReader

问题

在这种情况下，是否有一种方式可以原样导入特殊字符，比如"和@？

英文:

I am trying to read unl file in Spring batch.
Use FlatFileItemReader and delimiter is "|".

001-A472468827" |N|100| The delimiter does not work when encountering this data.
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character.
quoteCharacter doesn't seem to work.
In this situation, is there a way to import special characters such as " and @ as they are?

@Bean
  @StepScope
  public FlatFileItemReader unlFileReader() throws MalformedURLException {
    return new FlatFileItemReaderBuilder&lt;ExampleDTO&gt;()
        .name(&quot;unlFileReader&quot;)
        /*.encoding(StandardCharsets.UTF_8.name())*/
        .resource(fileService.inputFileResource(UNZIP_PATH + &quot;example.unl&quot;))
        .fieldSetMapper(new BeanWrapperFieldSetMapper&lt;&gt;())
        .targetType(ExampleDTO.class)
        .delimited().delimiter(&quot;|&quot;)
        .quoteCharacter(&#39;@&#39;)
        .quoteCharacter(&#39;&quot;&#39;)
        .quoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER)
        .includedFields(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141
        )
        .names(ExampleDTO.getFieldNameArrays())
        .build();
  }

In this situation, is there a way to import special characters such as " and @ as they are?

答案1

得分: 0

您多次调用了quoteCharacter()函数，请注意，这会覆盖先前的值，并不会将引号字符添加到引号字符列表中。只会使用一个引号字符（如果连续多次调用，则使用最后一个添加的引号字符）。

如果数据包含"和空格，或者包含@字符，分隔符将无法分割数据。

这是因为"是默认的引号字符。如果输入包含单个"，则需要指定另一个分隔符（否则Spring Batch将其视为数据中的"bug`，这是正确的，因为该字段没有正确引用）。以下是一个通过的快速测试示例：

@Test
void testPipeDelimiter() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setDelimiter("|");
    tokenizer.setQuoteCharacter(' ');
    String s = "001-A472468827|N|100|";
    FieldSet fieldSet = tokenizer.tokenize(s);
    Assertions.assertEquals("001-A472468827", fieldSet.readString(0));
    Assertions.assertEquals("N", fieldSet.readString(1));
    Assertions        .assertEquals("100", fieldSet.readString(2));
}

这个测试显示"是第一个字段的一部分。相同的测试也可以通过输入中包含@来通过：

@Test
void testPipeDelimiter() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setDelimiter("|");
    tokenizer.setQuoteCharacter(' ');
    String s = "001-A472468827@|N|100|";
    FieldSet fieldSet = tokenizer.tokenize(s);
    Assertions.assertEquals("001-A472468827@", fieldSet.readString(0));
    Assertions.assertEquals("N", fieldSet.readString(1));
    Assertions.assertEquals("100", fieldSet.readString(2));
}

英文:

You are calling quoteCharacter() several times, note that this overrides the previous value and does not add the quote character to a list of quote characters. Only one quote character will be used (the last one added if you chain such calls).

> Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character

This is because " is the default quote character. If the input contains a single ", you need to specify another delimiter (otherwise Spring Batch considers that as a "bug" in your data, which is true as the field is not correctly quoted). Here is a quick test that passes:

@Test
void testPipeDelimiter() {
	DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
	tokenizer.setDelimiter(&quot;|&quot;);
	tokenizer.setQuoteCharacter(&#39; &#39;);
	String s = &quot;001-A472468827\&quot;|N|100|&quot;;
	FieldSet fieldSet = tokenizer.tokenize(s);
	Assertions.assertEquals(&quot;001-A472468827\&quot;&quot;, fieldSet.readString(0));
	Assertions.assertEquals(&quot;N&quot;, fieldSet.readString(1));
	Assertions.assertEquals(&quot;100&quot;, fieldSet.readString(2));
}

This test shows that the " is part of the first field. The same test passes with a @ in the input:

@Test
void testPipeDelimiter() {
	DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
	tokenizer.setDelimiter(&quot;|&quot;);
	tokenizer.setQuoteCharacter(&#39; &#39;);
	String s = &quot;001-A472468827@|N|100|&quot;;
	FieldSet fieldSet = tokenizer.tokenize(s);
	Assertions.assertEquals(&quot;001-A472468827@&quot;, fieldSet.readString(0));
	Assertions.assertEquals(&quot;N&quot;, fieldSet.readString(1));
	Assertions.assertEquals(&quot;100&quot;, fieldSet.readString(2));
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

DelimitedLineTokenizer中的分隔符之外的特殊字符

问题

答案1

Java Spring Boot加Spring Batch创建Jar，仅运行特定的作业。

如何使 Spring Batch 步骤执行并行，可配置线程数？

如何控制在将它们写入数据库之前处理块中的元素？

How to add custom properties to a Spring Batch Step, Job, Flow?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论