Taking an integer from a scanner token that may or may not include symbols




I have a function that takes a scanner of a text file as input, and I need to extract integer values from each line. These lines might not follow a rigid syntax.

I have tried to to use skip() to ignore specific non-integers, but I fear I may be using it for something it's not capable of.

I've also tried turning the token into a string and using replaceAll(";", ""), but that quickly turns my code into a mess of if statements and String to int conversions. It gets bad quite fast considering I have a lot of different variables that need to be set here.

Is there is a more elegant solution?

Here is my input file:

pop 25; // my code must accept this
pop 25 ; // and also this
house 3.2, 1; // some lines will set multiple values
house 3.2 , 1 ; // so I will need to ignore both commas and semicolons

Here is my code:

static int population = -1;
static double median = -1;
static double scatter = -1;

private static void readCommunity(Scanner sc) {
    while (sc.hasNext()) {
        String input = sc.next();
        if ("pop".equals(input)) {
            sc.skip(";*"); // my guess is this wouldn't work unless the
                           // token had a ';' BEFORE the integer
            if (sc.hasNextInt()) {
                population = sc.nextInt();
            } else { // throw an error. not important here }
        } else if ("house".equals(input)) {
            if (sc.hasNextDouble()) {
                median = sc.nextDouble;
                if (sc.hasNextDouble()) {
                    scatter = sc.nextDouble();
                } else { // error }
            } else { // error }


In my opinion, I think it's just easier to read each entire file data line then split that line into what I need, and do validations on the read in data values, etc. For example:

private static void readCommunity(String dataFilePath) {
File file = new File(dataFilePath);
if (!file.exists()) {
System.err.println(&quot;File Not Found! (&quot; + dataFilePath + &quot;)&quot;);
int lineCount = 0;   // For counting file lines.
// &#39;Try With Resources&#39; used here so as to auto-close reader.
try (Scanner sc = new Scanner(file)) {
while (sc.hasNextLine()) {
String fileInput = sc.nextLine().trim();
lineCount++;   // Increment line counter.
// Skip blank lines (if any).
if (fileInput.isEmpty()) {
/* Remove comments from data line (if any). Your file 
example shows comments at the end of each line. Yes, 
I realize that your file most likely doesn&#39;t contain 
these but it doesn&#39;t hurt to have this here in case 
it does or if you want to have that option. Comments
can start with // or /*. Comments must be at the end
of a data line. This &#39;does not&#39; support any Multi-line 
comments. More code is needed for that.            */
if (fileInput.contains(&quot;//&quot;) || fileInput.contains(&quot;/*&quot;)) {
fileInput = fileInput.substring(0, fileInput.contains(&quot;//&quot;)
? fileInput.indexOf(&quot;//&quot;) : fileInput.indexOf(&quot;/*&quot;));
// Start parsing the data line into required parts...
// Start with semicolon portions
String[] lineMainParts = fileInput.split(&quot;\\s{0,};\\s{0,}&quot;);
/* Iterate through all the main elemental parts on a 
data line (if there is more than one), for example:
pop 30; house 4.3, 1; pop 32; house 3.3, 2   */
for (int i = 0; i &lt; lineMainParts.length; i++) {
// Is it a &#39;pop&#39; attribute?
if (lineMainParts[i].toLowerCase().startsWith(&quot;pop&quot;)) {
//Yes it is... so validate, convert, and display the value.
String[] attributeParts = lineMainParts[i].split(&quot;\\s+&quot;);
if (attributeParts[1].matches(&quot;-?\\d+|\\+?\\d+&quot;)) {   // validate string numerical value (Integer).
population = Integer.valueOf(attributeParts[1]);  // convert to Integer
System.out.println(&quot;Population:\t&quot; + population); // display...
else {
System.err.println(&quot;Invalid population value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
// Is it a &#39;house&#39; attribute?
else if (lineMainParts[i].toLowerCase().startsWith(&quot;house&quot;)) {
/* Yes it is... so split all comma delimited attribute values
for &#39;house&#39;, validate each numerical value, convert each 
numerical value, and display each attribute and their 
respective values.  */
String[] attributeParts = lineMainParts[i].split(&quot;\\s{0,},\\s{0,}|\\s+&quot;);
if (attributeParts[1].matches(&quot;-?\\d+(\\.\\d+)?&quot;)) {   // validate median string numerical value (Double or Integer).
median = Double.valueOf(attributeParts[1]);        // convert to Double.
System.out.println(&quot;Median:     \t&quot; + median);     // display median...
else {
System.err.println(&quot;Invalid Median value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
if (attributeParts[2].matches(&quot;-?\\d+|\\+?\\d+&quot;)) {   // validate scatter string numerical value (Integer).
scatter = Integer.valueOf(attributeParts[2]);     // convert to Integer
System.out.println(&quot;Scatter:    \t&quot; + scatter);   // display scatter...
else {
System.err.println(&quot;Invalid Scatter value detected in file on line &quot;
+ lineCount + &quot;! (&quot; + lineMainParts[i] + &quot;)&quot;);
else {
System.err.println(&quot;Unhandled Data Attribute detected in data file on line &quot; + lineCount + &quot;! (&quot;
+ lineMainParts[i] + &quot;)&quot;);
catch (FileNotFoundException ex) {

There are several Regular Expressions (RegEx) used in the code above. Here is what they mean in the order they are encountered in code:


Used with the String#split() method for parsing a semicolon (;) delimited line. This regex pretty much covers the bases for when semicolon delimited string data needs to be split but the semicolon may be spaced in several different fashions within the string, for example:

&quot;data;data ;data; data ; data;      data       ;data&quot;
  • \\s{0,} 0 or more whitespaces before the semicolon.
  • ; The literal semicolon delimiter itself.
  • \\s{0,} 0 or more whitespaces after the semicolon.


Used with the String#split() method for parsing a whitespace (" ") delimited line. This regex pretty much covers the bases for when whitespaced delimited string data needs to be split but there may be anywhere from 1 to several whitespace or tab characters separating the string tokens for example:

&quot;datadata&quot;                      Split to: [datadata] (Need at least 1 space)
&quot;data data&quot;                     Split to: [data, data] 
&quot;data   data&quot;                   Split to: [data, data] 
&quot;data        data       data&quot;   Split to: [data, data, data] 


Used with the String#matches() method for string numerics validation. This regex is used to see if the tested string is indeed a string representation of a signed or unsigned integer numerical value (of any length). Used in the code above for numerical string validation before converting that numerical value to Integer. String representations can be:

-1   1   324   +2   342345   -65379   74   etc.
  • -? If the string optionally starts with or doesn't start with the
    Hyphen character indicating a signed value.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9.
  • | Logical OR
  • \\+? If the string optionally starts with or doesn't start with the
    Plus character.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9.

"\\s{0,},\\s{0,}|\\s+" (must be in this order)

Used with the String#split() method for parsing a comma (,) delimited line. This regex pretty much covers the bases for when comma delimited string data needs to be split but the comma may be spaced in several different fashions within the string, for example:

&quot;my data,data&quot;            Split to: [my, data, data] 
&quot;my data ,data&quot;           Split to: [my, data, data] 
&quot;my data, data&quot;           Split to: [my, data, data] 
&quot;my data , data&quot;          Split to: [my, data, data] 
&quot;my   data,      data&quot;    Split to: [my, data, data] 
&quot;my    data      ,data&quot;   Split to: [my, data, data] 
  • \\s{0,} 0 or more whitespaces before the comma.
  • , The literal comma delimiter itself.
  • \\s{0,} 0 or more whitespaces after the comma.
  • | Logical OR split on...
  • \\s+ Just one or more whitespace delimiter.

So in other words, split on either: just comma OR split on comma and one or more whitespaces OR split on one or more whitespaces and comma OR split on one or more whitespaces and comma and one or more whitespaces OR split on just one or more whitespaces


Used with the String#matches() method for string numerics validation. This regex is used to see if the tested string is indeed a string representation of a signed or unsigned integer or double type numerical value (of any length). Used in the code above for numerical string validation before converting that numerical value to Double. String representations can be:

-1.34   1.34   324   2.54335   342345   -65379.7   74   etc.
  • -? If the string optionally starts with or doesn't start with the
    Hyphen character indicating a signed value.
  • \\d+ The string contains 1 or more (+) digits from 0
    to 9. [The string would be considered Integer up to this point.]
  • ( Start of a Group.
  • \\. If the string contains a literal Period (.) after the first set of digits.
  • \\d+ The string contains 1 or more (+) digits from 0 to 9 after the Period.
  • ) End of Group.
  • ? The data expressed within the Group expression may or may not be there making the Group an Option Group.

Hopefully, the above should be able to get you started.


A regex would probably be a better choice instead of a nextInt or a nextDouble. You could fetch each decimal value using

Pattern p = Pattern.compile(&quot;\\d+(\\.\\d+)?&quot;);
Matcher m = p.matcher(a);
while(m.find()) {

The regex checks for all occurrences of a decimal or non-decimal number in the given string.

\\d+ - One or more occurrence of a digit

(\\.\\d+) - Followed by a decimal and one or more digits

? - The expression in the parantheses is optional. So, the numbers may or may not contain decimals.

This will print the below for the data you provided



The problem you have with commas and semi-colons while parsing the line can be avoided by fetching the entire line using nextLine() instead of next(). next() only fetches one token at a time from the input. Using nextLine and a regular expression, you can read individual numbers as below.

      while (sc.hasNext()) {
Pattern p = Pattern.compile(&quot;\\d+(\\.\\d+)?&quot;);
Matcher m ;
int population = -1;
double median = -1;
double scatter = -1;
String input = sc.nextLine();	// fetches the entire line		
if (input.contains(&quot;pop&quot;)) {							
m = p.matcher(input);
while (m.find()) {
population = Integer.parseInt(m.group());
} else if (input.contains(&quot;house&quot;)) {
m = p.matcher(input);
median = Double.parseDouble(m.group());
scatter = Double.parseDouble(m.group());

