从扫描令牌中获取一个整数,该令牌可能包含符号。

huangapple go评论89阅读模式
英文:

Taking an integer from a scanner token that may or may not include symbols

问题

我有一个函数,它以文本文件的扫描器作为输入,并且我需要从每行中提取整数值。这些行可能没有遵循严格的语法。

我尝试使用skip()来忽略特定的非整数部分,但我担心我可能在将其用于某些它无法处理的东西。

我还尝试将标记转换为字符串,并使用replaceAll(""", ""),但这很快会使我的代码变成一堆if语句和字符串到整数的转换。考虑到我有很多需要在这里设置的不同变量,情况会很快变糟。

是否有更优雅的解决方案?

这是我的输入文件:

pop 25; // 我的代码必须接受这个
pop 25 ; // 还有这个
house 3.2, 1; // 有些行会设置多个值
house 3.2 , 1 ; // 所以我需要忽略逗号和分号

这是我的代码:

static int population = -1;
static double median = -1;
static double scatter = -1;

private static void readCommunity(Scanner sc) {
    while (sc.hasNext()) {
        String input = sc.next();
        if ("pop".equals(input)) {
            sc.skip(";*"); // 我猜这只有在标记之前有';'的情况下才起作用
            if (sc.hasNextInt()) {
                population = sc.nextInt();
            } else { // 抛出错误,此处不重要 }
            sc.nextLine();
        } else if ("house".equals(input)) {
            sc.skip(",*");
            if (sc.hasNextDouble()) {
                median = sc.nextDouble;
                sc.skip(";*");
                if (sc.hasNextDouble()) {
                    scatter = sc.nextDouble();
                } else { // 错误 }
            } else { // 错误 }
            sc.nextLine();
        }
    } 
}
英文:

I have a function that takes a scanner of a text file as input, and I need to extract integer values from each line. These lines might not follow a rigid syntax.

I have tried to to use skip() to ignore specific non-integers, but I fear I may be using it for something it's not capable of.

I've also tried turning the token into a string and using replaceAll(";", ""), but that quickly turns my code into a mess of if statements and String to int conversions. It gets bad quite fast considering I have a lot of different variables that need to be set here.

Is there is a more elegant solution?

Here is my input file:

pop 25; // my code must accept this
pop 25 ; // and also this
house 3.2, 1; // some lines will set multiple values
house 3.2 , 1 ; // so I will need to ignore both commas and semicolons

Here is my code:

static int population = -1;
static double median = -1;
static double scatter = -1;

private static void readCommunity(Scanner sc) {
    while (sc.hasNext()) {
        String input = sc.next();
        if ("pop".equals(input)) {
            sc.skip(";*"); // my guess is this wouldn't work unless the
                           // token had a ';' BEFORE the integer
            if (sc.hasNextInt()) {
                population = sc.nextInt();
            } else { // throw an error. not important here }
            sc.nextLine();
        } else if ("house".equals(input)) {
            sc.skip(",*");
            if (sc.hasNextDouble()) {
                median = sc.nextDouble;
                sc.skip(";*");
                if (sc.hasNextDouble()) {
                    scatter = sc.nextDouble();
                } else { // error }
            } else { // error }
            sc.nextLine();
        }
    } 
}

答案1

得分: 1

在我看来,我认为只需要读取每个完整的文件数据行,然后将该行分割成所需的部分,并对读取的数据值进行验证等操作会更容易阅读。例如:

private static void readCommunity(String dataFilePath) {
    File file = new File(dataFilePath);
    if (!file.exists()) {
        System.err.println("File Not Found! (" + dataFilePath + ")");
        return;
    }
    int lineCount = 0;   // 用于计数文件行数。
    // 在此使用'Try With Resources'以自动关闭读取器。
    try (Scanner sc = new Scanner(file)) {
        while (sc.hasNextLine()) {
            String fileInput = sc.nextLine().trim();
            lineCount++;   // 增加行计数器。
            // 跳过空行(如果有的话)。
            if (fileInput.isEmpty()) {
                continue;
            }
            /* 从数据行中移除注释(如果有的话)。您的文件示例显示每行末尾有注释。
               是的,我意识到您的文件很可能不包含这些内容,但是在这里添加这个选项也无妨,
               以防万一或者如果您希望有这个选项。注释可以以//或/*开头。
               注释必须位于数据行的末尾。这不支持多行注释,需要更多的代码。 */
            if (fileInput.contains("//") || fileInput.contains("/*")) {
                fileInput = fileInput.substring(0, fileInput.contains("//")
                        ? fileInput.indexOf("//") : fileInput.indexOf("/*"));
            }
            // 开始解析数据行为所需的部分...
            // 从分号部分开始
            String[] lineMainParts = fileInput.split("\\s{0,};\\s{0,}");
            /* 遍历数据行上的所有主要元素部分(如果有多个),例如:
               pop 30; house 4.3, 1; pop 32; house 3.3, 2 */
            for (int i = 0; i < lineMainParts.length; i++) {
                // 是否为'pop'属性?
                if (lineMainParts[i].toLowerCase().startsWith("pop")) {
                    // 是的,所以验证、转换和显示值。
                    String[] attributeParts = lineMainParts[i].split("\\s+");
                    if (attributeParts[1].matches("-?\\d+|\\+?\\d+")) {   // 验证字符串数字值(整数)。
                        population = Integer.valueOf(attributeParts[1]);  // 转换为整数
                        System.out.println("Population:\t" + population); // 显示...
                    }
                    else {
                        System.err.println("Invalid population value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                }
                // 是否为'house'属性?
                else if (lineMainParts[i].toLowerCase().startsWith("house")) {
                    /* 是的,所以分割所有逗号分隔的属性值
                   对于'house',验证每个数字值,转换每个
                   数字值,并显示每个属性及其值。  */
                    String[] attributeParts = lineMainParts[i].split("\\s{0,},\\s{0,}|\\s+");
                    if (attributeParts[1].matches("-?\\d+(\\.\\d+)?")) {   // 验证中位数字符串数字值(Double或Integer)。
                        median = Double.valueOf(attributeParts[1]);        // 转换为Double。
                        System.out.println("Median:\t\t" + median);     // 显示中位数...
                    }
                    else {
                        System.err.println("Invalid Median value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                    if (attributeParts[2].matches("-?\\d+|\\+?\\d+")) {   // 验证散布字符串数字值(整数)。
                        scatter = Integer.valueOf(attributeParts[2]);     // 转换为整数
                        System.out.println("Scatter:\t\t" + scatter);   // 显示散布...
                    }
                    else {
                        System.err.println("Invalid Scatter value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                }
                else {
                    System.err.println("Unhandled Data Attribute detected in data file on line " + lineCount + "! ("
                            + lineMainParts[i] + ")");
                }
            }
        }
    }
    catch (FileNotFoundException ex) {
        System.err.println(ex);
    }
}

在上面的代码中使用了几个正则表达式(RegEx)。以下是它们在代码中遇到的顺序中的含义:

"\\s{0,};\\s{0,}"
"\\s+"
"-?\\d+|\\+?\\d+"
"\\s{0,},\\s{0,}|\\s+"
"-?\\d+(\\.\\d+)?"

希望以上内容能帮助您入门。

英文:

In my opinion, I think it's just easier to read each entire file data line then split that line into what I need, and do validations on the read in data values, etc. For example:

private static void readCommunity(String dataFilePath) {
File file = new File(dataFilePath);
if (!file.exists()) {
System.err.println(&quot;File Not Found! (&quot; + dataFilePath + &quot;)&quot;);
return;
}
int lineCount = 0;   // For counting file lines.
// &#39;Try With Resources&#39; used here so as to auto-close reader.
try (Scanner sc = new Scanner(file)) {
while (sc.hasNextLine()) {
String fileInput = sc.nextLine().trim();
lineCount++;   // Increment line counter.
// Skip blank lines (if any).
if (fileInput.isEmpty()) {
continue;
}
/* Remove comments from data line (if any). Your file 
example shows comments at the end of each line. Yes, 
I realize that your file most likely doesn&#39;t contain 
these but it doesn&#39;t hurt to have this here in case 
it does or if you want to have that option. Comments
can start with // or /*. Comments must be at the end
of a data line. This &#39;does not&#39; support any Multi-line 
comments. More code is needed for that.            */
if (fileInput.contains(&quot;//&quot;) || fileInput.contains(&quot;/*&quot;)) {
fileInput = fileInput.substring(0, fileInput.contains(&quot;//&quot;)
? fileInput.indexOf(&quot;//&quot;) : fileInput.indexOf(&quot;/*&quot;));
}
// Start parsing the data line into required parts...
// Start with semicolon portions
String[] lineMainParts = fileInput.split(&quot;\\s{0,};\\s{0,}&quot;);
/* Iterate through all the main elemental parts on a 
data line (if there is more than one), for example:
pop 30; house 4.3, 1; pop 32; house 3.3, 2   */
for (int i = 0; i &lt; lineMainParts.length; i++) {
// Is it a &#39;pop&#39; attribute?
if (lineMainParts[i].toLowerCase().startsWith(&quot;pop&quot;)) {
//Yes it is... so validate, convert, and display the value.
String[] attributeParts = lineMainParts[i].split(&quot;\\s+&quot;);
if (attributeParts[1].matches(&quot;-?\\d+|\\+?\\d+&quot;)) {   // validate string numerical value (Integer).
population = Integer.valueOf(attributeParts[1]);  // convert to Integer
System.out.println(&quot;Population:\t&quot; + population); // display...
}
else {
System.err.println(&quot;Invalid population value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
}
}
// Is it a &#39;house&#39; attribute?
else if (lineMainParts[i].toLowerCase().startsWith(&quot;house&quot;)) {
/* Yes it is... so split all comma delimited attribute values
for &#39;house&#39;, validate each numerical value, convert each 
numerical value, and display each attribute and their 
respective values.  */
String[] attributeParts = lineMainParts[i].split(&quot;\\s{0,},\\s{0,}|\\s+&quot;);
if (attributeParts[1].matches(&quot;-?\\d+(\\.\\d+)?&quot;)) {   // validate median string numerical value (Double or Integer).
median = Double.valueOf(attributeParts[1]);        // convert to Double.
System.out.println(&quot;Median:     \t&quot; + median);     // display median...
}
else {
System.err.println(&quot;Invalid Median value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
}
if (attributeParts[2].matches(&quot;-?\\d+|\\+?\\d+&quot;)) {   // validate scatter string numerical value (Integer).
scatter = Integer.valueOf(attributeParts[2]);     // convert to Integer
System.out.println(&quot;Scatter:    \t&quot; + scatter);   // display scatter...
}
else {
System.err.println(&quot;Invalid Scatter value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
}
}
else {
System.err.println(&quot;Unhandled Data Attribute detected in data file on line &quot; + lineCount + &quot;! (&quot;
+ lineMainParts[i] + &quot;)&quot;);
}
}
}
}
catch (FileNotFoundException ex) {
System.err.println(ex);
}
}

There are several Regular Expressions (RegEx) used in the code above. Here is what they mean in the order they are encountered in code:

"\\s{0,};\\s{0,}"

Used with the String#split() method for parsing a semicolon (;) delimited line. This regex pretty much covers the bases for when semicolon delimited string data needs to be split but the semicolon may be spaced in several different fashions within the string, for example:

&quot;data;data ;data; data ; data;      data       ;data&quot;
  • \\s{0,} 0 or more whitespaces before the semicolon.
  • ; The literal semicolon delimiter itself.
  • \\s{0,} 0 or more whitespaces after the semicolon.

"\\s+"

Used with the String#split() method for parsing a whitespace (" ") delimited line. This regex pretty much covers the bases for when whitespaced delimited string data needs to be split but there may be anywhere from 1 to several whitespace or tab characters separating the string tokens for example:

&quot;datadata&quot;                      Split to: [datadata] (Need at least 1 space)
&quot;data data&quot;                     Split to: [data, data] 
&quot;data   data&quot;                   Split to: [data, data] 
&quot;data        data       data&quot;   Split to: [data, data, data] 

"-?\\d+|\\+?\\d+"

Used with the String#matches() method for string numerics validation. This regex is used to see if the tested string is indeed a string representation of a signed or unsigned integer numerical value (of any length). Used in the code above for numerical string validation before converting that numerical value to Integer. String representations can be:

-1   1   324   +2   342345   -65379   74   etc.
  • -? If the string optionally starts with or doesn't start with the
    Hyphen character indicating a signed value.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9.
  • | Logical OR
  • \\+? If the string optionally starts with or doesn't start with the
    Plus character.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9.

"\\s{0,},\\s{0,}|\\s+" (must be in this order)

Used with the String#split() method for parsing a comma (,) delimited line. This regex pretty much covers the bases for when comma delimited string data needs to be split but the comma may be spaced in several different fashions within the string, for example:

&quot;my data,data&quot;            Split to: [my, data, data] 
&quot;my data ,data&quot;           Split to: [my, data, data] 
&quot;my data, data&quot;           Split to: [my, data, data] 
&quot;my data , data&quot;          Split to: [my, data, data] 
&quot;my   data,      data&quot;    Split to: [my, data, data] 
&quot;my    data      ,data&quot;   Split to: [my, data, data] 
  • \\s{0,} 0 or more whitespaces before the comma.
  • , The literal comma delimiter itself.
  • \\s{0,} 0 or more whitespaces after the comma.
  • | Logical OR split on...
  • \\s+ Just one or more whitespace delimiter.

So in other words, split on either: just comma OR split on comma and one or more whitespaces OR split on one or more whitespaces and comma OR split on one or more whitespaces and comma and one or more whitespaces OR split on just one or more whitespaces


"-?\\d+(\\.\\d+)?"

Used with the String#matches() method for string numerics validation. This regex is used to see if the tested string is indeed a string representation of a signed or unsigned integer or double type numerical value (of any length). Used in the code above for numerical string validation before converting that numerical value to Double. String representations can be:

-1.34   1.34   324   2.54335   342345   -65379.7   74   etc.
  • -? If the string optionally starts with or doesn't start with the
    Hyphen character indicating a signed value.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9. [The string would be considered Integer up to this point.]
  • ( Start of a Group.
  • \\. If the string contains a literal Period (.) after the first set of digits.
  • \\d+ The string contains 1 or more (+) digits from 0 to 9 after the Period.
  • ) End of Group.
  • ? The data expressed within the Group expression may or may not be there making the Group an Option Group.

Hopefully, the above should be able to get you started.

答案2

得分: -1

一个正则表达式可能是一个更好的选择,而不是使用nextInt或nextDouble。您可以使用以下代码提取每个十进制值:

Pattern p = Pattern.compile("\\d+(\\.\\d+)?");
Matcher m = p.matcher(a);
while(m.find()) {
    System.out.println(m.group());
}

该正则表达式检查给定字符串中所有十进制或非十进制数的出现。

  • \\d+ - 一个或多个数字
  • (\\.\\d+) - 后面跟着一个小数点和一个或多个数字
  • ? - 括号中的表达式是可选的。因此,数字可以包含小数点,也可以不包含。

对于您提供的数据,这将打印出以下内容:

25
25
3.2
1
3.2
1

编辑:

您在解析行时遇到的逗号和分号问题可以通过使用nextLine()而不是next()来获取整行内容来避免。next()仅从输入中获取一个标记。使用nextLine和正则表达式,您可以按以下方式读取单个数字。

while (sc.hasNext()) {
    Pattern p = Pattern.compile("\\d+(\\.\\d+)?");
    Matcher m ;
    int population = -1;
    double median = -1;
    double scatter = -1;
    String input = sc.nextLine();   // 获取整行内容
    if (input.contains("pop")) {                            
        m = p.matcher(input);
        while (m.find()) {
            population = Integer.parseInt(m.group());
        }
    } else if (input.contains("house")) {
        m = p.matcher(input);
        m.find();
        median = Double.parseDouble(m.group());
        m.find();
        scatter = Double.parseDouble(m.group());
    }           
}
英文:

A regex would probably be a better choice instead of a nextInt or a nextDouble. You could fetch each decimal value using

Pattern p = Pattern.compile(&quot;\\d+(\\.\\d+)?&quot;);
Matcher m = p.matcher(a);
while(m.find()) {
System.out.println(m.group());
}

The regex checks for all occurrences of a decimal or non-decimal number in the given string.

\\d+ - One or more occurrence of a digit

(\\.\\d+) - Followed by a decimal and one or more digits

? - The expression in the parantheses is optional. So, the numbers may or may not contain decimals.

This will print the below for the data you provided

25
25
3.2
1
3.2
1

EDIT:

The problem you have with commas and semi-colons while parsing the line can be avoided by fetching the entire line using nextLine() instead of next(). next() only fetches one token at a time from the input. Using nextLine and a regular expression, you can read individual numbers as below.

      while (sc.hasNext()) {
Pattern p = Pattern.compile(&quot;\\d+(\\.\\d+)?&quot;);
Matcher m ;
int population = -1;
double median = -1;
double scatter = -1;
String input = sc.nextLine();	// fetches the entire line		
if (input.contains(&quot;pop&quot;)) {							
m = p.matcher(input);
while (m.find()) {
population = Integer.parseInt(m.group());
}
} else if (input.contains(&quot;house&quot;)) {
m = p.matcher(input);
m.find();
median = Double.parseDouble(m.group());
m.find();
scatter = Double.parseDouble(m.group());
}   			
}

huangapple
  • 本文由 发表于 2020年10月3日 08:48:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/64179707.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定