Java StringTokenizer – 使用substring时出现的nextToken()问题

huangapple go评论66阅读模式
英文:

Java StringTokenizer - Problems with nextToken() usage with substring

问题

我有一个文本文件,我必须迭代遍历并希望将每行的某些元素移到一个 ArrayList 中。文件的每一行的格式如下:number. String number. decimal decimal
由于两个数字在末尾有一个句点(.),我需要将它们读取为字符串,然后使用子字符串删除句点,并转换为原始数据类型(intshort)。

示例文件:
294. ABC123 66. .00 .00

如果我尝试这样做,我会得到一个字符串范围错误:(* temp 是一个字符串)

while(fileLine.hasMoreTokens())
{
    oneNumber = Integer.valueOf(fileLine.nextToken().substring(0, 
                          fileLine.nextToken().indexOf('.')));
    twoString = fileLine.nextToken();
    threeNumber = Short.valueOf(fileLine.nextToken().substring(0, 
                          fileLine.nextToken().indexOf('.')));
    temp = fileLine.nextToken();    //处理不需要的属性
    temp = fileLine.nextToken();    //处理不需要的属性
}

我相信发生这种情况的原因是子串中的 nextToken() 在参数中混淆了 StringTokenizer。所以我像这样修复它:

while(fileLine.hasMoreTokens())
{
    temp = fileLine.nextToken();
    oneNumber = Integer.valueOf(temp.substring(0, temp.indexOf('.')));
    twoString = fileLine.nextToken();
    temp = fileLine.nextToken();
    threeNumber= Short.valueOf(temp.substring(0, temp.indexOf('.')));
    temp = fileLine.nextToken();
    temp = fileLine.nextToken();
}

虽然这个方法有效,但感觉有点多余。是否有什么方法可以使这个过程更加简洁,同时保留对 StringTokenizer 的使用?

英文:

I have a text file I must iterate through and want to move certain elements of each line into an ArrayList. Each line of the file is in the format: number. String number. decimal decimal
As the two numbers have a full stop (.) at the end and I need to read these as a String, removed the . using substring and then convert to a primitive data type (int or short).

Example on file:
294. ABC123 66. .00 .00

I get a string range error if I try this: (* temp is a String)

while(fileLine.hasMoreTokens())
{
	oneNumber = Integer.valueOf(fileLine.nextToken().substring(0, 
                          fileLine.nextToken().indexOf('.')));
	twoString = fileLine.nextToken();
	threeNumber = Short.valueOf(fileLine.nextToken().substring(0, 
                          fileLine.nextToken().indexOf('.')));
	temp = fileLine.nextToken();    //Handle attributes not required
	temp = fileLine.nextToken();    //Handle attributes not required
}

I believe why this is happening is that the nextToken() in the substring's parameters is confusing the StringTokenizer. So I fixed it like this:

				while(fileLine.hasMoreTokens())
				{
					temp = fileLine.nextToken();
					oneNumber = Integer.valueOf(temp.substring(0, temp.indexOf('.')));
					twoString = fileLine.nextToken();
					temp = fileLine.nextToken();
					threeNumber= Short.valueOf(temp.substring(0, temp.indexOf('.')));
					temp = fileLine.nextToken();
					temp = fileLine.nextToken();
				}

While this works it feels a bit redundant. Is there something I can try to make this cleaner, while retaining use of the StringTokenizer?

答案1

得分: 1

这是.nextToken()的预期行为:它返回令牌并移动到当前令牌之后。当你使用Integer.valueOf(fileLine.nextToken().substring(0, fileLine.nextToken().indexOf('.')))时,你调用了.nextToken()两次,这意味着你在处理两个不同的令牌。这与String#substring的工作方式无关。如果你需要对其执行其他操作,你需要将令牌存储在一个变量中。同样的问题也可以由在应该存储该值时两次使用BufferedReader#readLine引起。

英文:

This is the intended behavior of .nextToken(): it returns the token and moves past the current token. When you use Integer.valueOf(fileLine.nextToken().substring(0, fileLine.nextToken().indexOf('.'))), you are calling .nextToken() twice, which means you are dealing with two distinct tokens. It has nothing to do with how String#substring works. You need to store the token in a variable if you need to perform additional operations on it. This exact same problem can also be caused by using BufferedReader#readLine twice when one should be storing the value.

答案2

得分: 1

Yup. nextToken() 是有状态的,调用它会改变状态,因此在同一行中两次使用它将消耗两个令牌。

你的第二个片段对我来说看起来更容易阅读,所以我不确定问题在哪里。可能你希望你的代码更具可读性。

一个简单的解决方法是创建辅助方法:

while (fileLine.hasMoreTokens()) {
    oneNumber = fetchHeadingNumber(fileLine);
    twoString = fileLine.nextToken();
    threeNumber = fetchHeadingNumber(fileLine);
    fileLine.nextToken(); // 无需赋值
    fileLine.nextToken();
}

使用这个方法:

int fetchHeadingNumber(StringTokenizer t) {
    String token = t.nextToken();
    return Integer.parseInt(token.substring(0, token.indexOf('.')));
}

你甚至可以进一步创建一个表示一行的类,该类具有解析所需的所有代码(我随意取了一些名称;你的代码片段并没有明确表示一行代表什么样的东西):

@lombok.Value class InventoryItem {
    int warehouse;
    String name;
    int shelf;

    public static InventoryItem read(StringTokenizer tokenizer) {
        int warehouse = num(tokenizer);
        String name = tokenizer.nextToken();
        int shelf = num(tokenizer);
        tokenizer.nextToken();
        tokenizer.nextToken();
        return new InventoryItem(warehouse, name, shelf);
    }
    private static int num(StringTokenizer t) {
        String token = t.nextToken();
        return Integer.parseInt(token.substring(0, token.indexOf('.')));
    }
}

然后,读取一行并检索,例如,存储在哪里的位置就变得更加方便:现在事物都有名称了!

InventoryItem item = InventoryItem.read(fileLine);
System.out.println("This item is in warehouse " + item.getWarehouse());

注意:使用 lombok 的 @Value 来避免在这个答案中添加大量样板代码。

英文:

Yup. nextToken() is stateful, calling it changes things, so using it twice in a single line would consume two tokens.

Your second snippet seems much easier to read to me, so I'm not sure what the problem is. Presumably you want your code to be more readable.

An easy fix is to make helper methods:

while (fileLine.hasMoreTokens()) {
    oneNumber = fetchHeadingNumber(fileLine);
    twoString = fileLine.nextToken();
    threeNumber = fetchHeadingNumber(fileLine);
    fileLine.nextToken(); // no need to assign it.
    fileLine.nextToken();
}

with this method:

int fetchHeadingNumber(StringTokenizer t) {
    String token = t.nextToken();
    return Integer.parseInt(token.substring(0, token.indexOf('.')));
}

you can go even further and make a class representing a line, which has all the code needed to parse it (I made up names; your snippet doesn't make clear what kind of thing the line represents):

@lombok.Value class InventoryItem {
    int warehouse;
    String name;
    int shelf;

    public static InventoryItem read(StringTokenizer tokenizer) {
        int warehouse = num(tokenizer);
        String name = tokenizer.nextToken();
        int shelf = num(tokenizer);
        tokenizer.nextToken();
        tokenizer.nextToken();
        return new InventoryItem(warehouse, name, shelf);
    }
    private static int num(StringTokenizer t) {
        String token = t.nextToken();
        return Integer.parseInt(token.substring(0, token.indexOf('.')));
    }
}

and then reading a line and retrieving, say, the location where it is stored is so much nicer: Now things actually have names!

InventoryItem item = InventoryItem.read(fileLine);
System.out.println("This item is in warehouse " + item.getWarehouse());

NB: Uses lombok's @Value to avoid putting a lot of boilerplate in this answer.

huangapple
  • 本文由 发表于 2020年8月6日 22:32:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/63285855.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定