英文:
Special characters other than delimiter in DelimitedLineTokenizer of FlatFileItemReader
问题
在这种情况下,是否有一种方式可以原样导入特殊字符,比如"和@?
英文:
I am trying to read unl file in Spring batch.
Use FlatFileItemReader and delimiter is "|".
001-A472468827" |N|100| The delimiter does not work when encountering this data.
Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character.
quoteCharacter doesn't seem to work.
In this situation, is there a way to import special characters such as " and @ as they are?
@Bean
@StepScope
public FlatFileItemReader unlFileReader() throws MalformedURLException {
return new FlatFileItemReaderBuilder<ExampleDTO>()
.name("unlFileReader")
/*.encoding(StandardCharsets.UTF_8.name())*/
.resource(fileService.inputFileResource(UNZIP_PATH + "example.unl"))
.fieldSetMapper(new BeanWrapperFieldSetMapper<>())
.targetType(ExampleDTO.class)
.delimited().delimiter("|")
.quoteCharacter('@')
.quoteCharacter('"')
.quoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER)
.includedFields(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141
)
.names(ExampleDTO.getFieldNameArrays())
.build();
}
In this situation, is there a way to import special characters such as " and @ as they are?
答案1
得分: 0
您多次调用了quoteCharacter()
函数,请注意,这会覆盖先前的值,并不会将引号字符添加到引号字符列表中。只会使用一个引号字符(如果连续多次调用,则使用最后一个添加的引号字符)。
如果数据包含
"
和空格,或者包含@
字符,分隔符将无法分割数据。
这是因为"
是默认的引号字符。如果输入包含单个"
,则需要指定另一个分隔符(否则Spring Batch将其视为数据中的"
bug`,这是正确的,因为该字段没有正确引用)。以下是一个通过的快速测试示例:
@Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions .assertEquals("100", fieldSet.readString(2));
}
这个测试显示"
是第一个字段的一部分。相同的测试也可以通过输入中包含@
来通过:
@Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827@|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827@", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}
英文:
You are calling quoteCharacter()
several times, note that this overrides the previous value and does not add the quote character to a list of quote characters. Only one quote character will be used (the last one added if you chain such calls).
> Data cannot be divided by the delimiter if it contains " and spaces or if it contains the @ character
This is because "
is the default quote character. If the input contains a single "
, you need to specify another delimiter (otherwise Spring Batch considers that as a "bug" in your data, which is true as the field is not correctly quoted). Here is a quick test that passes:
@Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827\"|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827\"", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}
This test shows that the "
is part of the first field. The same test passes with a @
in the input:
@Test
void testPipeDelimiter() {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setQuoteCharacter(' ');
String s = "001-A472468827@|N|100|";
FieldSet fieldSet = tokenizer.tokenize(s);
Assertions.assertEquals("001-A472468827@", fieldSet.readString(0));
Assertions.assertEquals("N", fieldSet.readString(1));
Assertions.assertEquals("100", fieldSet.readString(2));
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论