英文:
StreamTokenizer - How to split every character into tokens
问题
简而言之:如何修改StreamTokenizer,以便将输入文件中的每个字符拆分为标记。
例如,如果我有以下输入:
1023021023584
如何读取此输入,以便将每个单独的字符保存到数组的特定索引?
英文:
In short: how do you alter the StreamTokenizer so that it will split each character in an input file into tokens.
For example, if I have the following input:
1023021023584
How can this be read so that each individual character can be saved to a specific index of an array?
答案1
得分: 1
从文件中逐个读取字符作为“标记”,请使用 Reader
:
try (BufferedReader in = Files.newBufferedReader(Paths.get("test.txt"))) {
for (int charOrEOF; (charOrEOF = in.read()) != -1; ) {
String token = String.valueOf((char) charOrEOF);
// 在此处使用标记
}
}
为了完全支持来自补充平面的 Unicode 字符,例如表情符号,我们需要读取代理对:
try (BufferedReader in = Files.newBufferedReader(Paths.get("test.txt"))) {
for (int char1, char2; (char1 = in.read()) != -1; ) {
String token = (Character.isHighSurrogate((char) char1) && (char2 = in.read()) != -1)
? String.valueOf(new char[] { (char) char1, (char) char2 })
: String.valueOf((char) char1));
// 在此处使用标记
}
}
英文:
To read characters individually from a file as "tokens", use a Reader
:
try (BufferedReader in = Files.newBufferedReader(Paths.get("test.txt"))) {
for (int charOrEOF; (charOrEOF = in.read()) != -1; ) {
String token = String.valueOf((char) charOrEOF);
// Use token here
}
}
For full support of Unicode characters from the supplemental planes, e.g. emojis, we need to read surrogate pairs:
try (BufferedReader in = Files.newBufferedReader(Paths.get("test.txt"))) {
for (int char1, char2; (char1 = in.read()) != -1; ) {
String token = (Character.isHighSurrogate((char) char1) && (char2 = in.read()) != -1)
? String.valueOf(new char[] { (char) char1, (char) char2 })
: String.valueOf((char) char1));
// Use token here
}
}
答案2
得分: 0
public static void main(String[] args) {
try (FileReader fileReader = new FileReader("C:\\test.txt");){
StreamTokenizer st = new StreamTokenizer(fileReader);
st.resetSyntax();
int token = 0;
while((token = st.nextToken()) != StreamTokenizer.TT_EOF) {
if(st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: "+st.nval);
} else if(st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("Word: "+st.sval);
} else {
System.out.println("Ordinary Char: "+(char)token);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
英文:
you have to call StreamTokenizer.resetSyntax() method as below
public static void main(String[] args) {
try (FileReader fileReader = new FileReader("C:\\test.txt");){
StreamTokenizer st = new StreamTokenizer(fileReader);
st.resetSyntax();
int token =0;
while((token = st.nextToken()) != StreamTokenizer.TT_EOF) {
if(st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: "+st.nval);
} else if(st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("Word: "+st.sval);
}else {
System.out.println("Ordinary Char: "+(char)token);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论