使Java Scanner程序更加健壮?

huangapple go评论36阅读模式
英文:

Making a Java Scanner program be more robust?

问题

我平常不在我的日常工作中使用Java,尽管我希望使用,因为Java有一个适用于一切的类(有时太多了)。昨天,我花了大部分时间编写下面的程序。小小抱怨一下:在我看来,java.util.scanner不够直观。我想要用它来扫描日志文件中的两种特定模式,并对其进行一些日期算术运算,然后打印结果。如果我拿我的日志文件并删除不匹配的行然后运行它,这个程序可以工作。我可以使用vi、sed或其他方式做到这一点,但我更感兴趣的是将这个实用程序变得更加适用于不太熟悉shell脚本或vi使用的人。我今天会继续努力,但我想知道是否有一些专业知识可以帮助我更快地前进。

import java.util.*;
import java.util.concurrent.*;
import java.text.SimpleDateFormat;
import java.util.regex.Pattern;
import java.util.regex.MatchResult;
import java.io.*;
import java.time.*;
import java.time.format.*;

public class TimeDiff {
    private static SimpleDateFormat m_formatter = new SimpleDateFormat("yyyy-MM-dd-HH.mm.ss");
    private static Pattern m_startRunPattern = Pattern.compile("start of run=([^,]+)");
    private static Pattern m_currentTimePattern = Pattern.compile("current time=(.+)");

    private String m_fileArg;
    private File m_file;
    private Scanner m_scanner;

    public TimeDiff(String[] args) {
        if (args.length == 0) {
            System.err.println("nope.");
            System.exit(1);
        }

        m_fileArg = args[0];
        m_file = new File(m_fileArg);
    }

    public String findPattern(Scanner fileScan, Pattern pattern) {
        String ret_val = null;
        try {
            ret_val = fileScan.findInLine(pattern);
            MatchResult result = fileScan.match();
            if (result.groupCount() > 0) {
                ret_val = result.group(1);
            }
        } catch (java.util.InputMismatchException e) {
            System.out.println("failed at second");
        } catch (java.lang.IllegalStateException e) {
            System.out.println("failed at second match " + e);
        }

        return ret_val;
    }

    public void run(String[] args) throws Exception {
        try (Scanner fileScan = new Scanner(m_file)) {
            while (fileScan.hasNext()) {
                String beginTimeStr = findPattern(fileScan, m_startRunPattern);
                String endTimeStr = findPattern(fileScan, m_currentTimePattern);

                if (beginTimeStr == null && endTimeStr == null) {
                    if (fileScan.hasNext()) {
                        fileScan.next();
                    }
                } else {
                    Date startDate = m_formatter.parse(beginTimeStr);
                    Date endDate = m_formatter.parse(endTimeStr);

                    long duration = endDate.getTime() - startDate.getTime();
                    long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(duration);
                    long diffInMinutes = TimeUnit.MILLISECONDS.toMinutes(duration);

                    long remainderSeconds = 0;
                    if (diffInMinutes > 0) {
                        remainderSeconds = diffInSeconds % diffInMinutes;
                    } else {
                        remainderSeconds = diffInSeconds;
                    }

                    System.out.println("elapsed seconds: " + diffInSeconds + ", (" + diffInMinutes + " minutes, " + remainderSeconds + " seconds).");
                    if (fileScan.hasNext()) {
                        fileScan.next();
                    }
                }
            }
        } catch (IOException exception) {
            System.out.println(exception);
        }
    }

    public static void main(java.lang.String args[]) {
        try {
            TimeDiff app = new TimeDiff(args);
            app.run(args);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

经过处理的日志文件条目如下:

DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.31.216800, current time=2020-07-29-17.54.13.530384
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.29.903984, current time=2020-07-29-17.54.13.805200
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.14.356440, current time=2020-07-29-17.54.13.907528
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:01  start of run=2020-07-29-23.15.27.722784, current time=2020-07-29-23.16.01.016640
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:04  start of run=2020-07-29-23.15.39.955272, current time=2020-07-29-23.16.04.418160
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:05  start of run=2020-07-29-23.15.52.154920, current time=2020-07-29-23.16.05.480384

当然,日志文件的其余部分包含业务逻辑内容(SQL等)。

英文:

I don't normally do Java in my daily work, although I wish I did since there is a class for everything (sometimes too many). Yesterday I spent the bulk of my day writing the program below. Tiny rant: java.util.scanner is less than intuitive, IMO. What I want to do with it is to scan a log file for two certain patterns and do some date arithmetic on it and print the results. This program works if I take my log file and delete non-matching lines then run it. I can do this with vi, sed, whatever, but I'm more interested in taking this utility and making it more usable for someone who isn't as comfortable with shell scripting or using vi. I'll be hammering on this a little more today but I wonder if there is some expertise here that can make me move forward more quickly.

import java.util.*;
import java.util.concurrent.*;
import java.text.SimpleDateFormat;
import java.util.regex.Pattern;
import java.util.regex.MatchResult;
import java.io.*;
import java.time.*;
import java.time.format.*;
public class TimeDiff {
private static SimpleDateFormat m_formatter = new SimpleDateFormat("yyyy-MM-dd-HH.mm.ss");
private static Pattern m_startRunPattern = Pattern.compile("start of run=([^,]+)");
private static Pattern m_currentTimePattern = Pattern.compile("current time=(.+)");
private String m_fileArg;
private File m_file;
private Scanner m_scanner;
public TimeDiff(String[] args)
{
if (args.length == 0) {
System.err.println("nope.");
System.exit(1);
}
m_fileArg = args[0];
m_file = new File(m_fileArg);
}
public String findPattern(Scanner fileScan, Pattern pattern)
{
String ret_val = null;
try {
ret_val = fileScan.findInLine(pattern);
MatchResult result = fileScan.match();
if (result.groupCount() > 0) {
ret_val = result.group(1);
}
}
catch (java.util.InputMismatchException e) {
System.out.println("failed at second");
}
catch (java.lang.IllegalStateException e) {
System.out.println("failed at second match " + e);
}
return ret_val;
}
public void run(String[] args) throws Exception
{
try (Scanner fileScan = new Scanner(m_file)) {
while (fileScan.hasNext()) {
String beginTimeStr = findPattern(fileScan, m_startRunPattern);
String endTimeStr = findPattern(fileScan, m_currentTimePattern);
if (beginTimeStr == null && endTimeStr == null) {
if (fileScan.hasNext()) {
fileScan.next();
}
}
else {
Date startDate = m_formatter.parse(beginTimeStr);
Date endDate   = m_formatter.parse(endTimeStr);
long duration  = endDate.getTime() - startDate.getTime();
long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(duration);
long diffInMinutes = TimeUnit.MILLISECONDS.toMinutes(duration);
long remainderSeconds = 0;
if (diffInMinutes > 0) {
remainderSeconds = diffInSeconds % diffInMinutes;
}
else {
remainderSeconds = diffInSeconds;
}
System.out.println("elapsed seconds: " + diffInSeconds + ", (" + diffInMinutes + " minutes, " + remainderSeconds + " seconds).");
if (fileScan.hasNext()) {
fileScan.next();
}
}
}
}
catch (IOException exception) {
System.out.println(exception);
}
}
public static void main(java.lang.String args[])
{
try {
TimeDiff app = new TimeDiff(args);
app.run(args);
}
catch (Exception e) {
e.printStackTrace();
}
}
}

The massaged log file entries look like:

DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.31.216800, current time=2020-07-29-17.54.13.530384
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.29.903984, current time=2020-07-29-17.54.13.805200
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 17:54:13  start of run=2020-07-29-17.53.14.356440, current time=2020-07-29-17.54.13.907528
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:01  start of run=2020-07-29-23.15.27.722784, current time=2020-07-29-23.16.01.016640
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:04  start of run=2020-07-29-23.15.39.955272, current time=2020-07-29-23.16.04.418160
DealWithResponse.cpp, DealWithResponse(XMLSocketApp &, DOMDocument *), 247 2020-07-29 23:16:05  start of run=2020-07-29-23.15.52.154920, current time=2020-07-29-23.16.05.480384

Of course, what the rest of the log file looks like contains business logic stuff (SQL, etc).

答案1

得分: 1

ScannerfindInLine文档中:

> 尝试查找指定模式的下一个出现,忽略分隔符。如果在下一个换行符之前找到模式,则扫描器会前进到匹配的输入并返回匹配模式的字符串。如果在输入中到达下一个换行符之前未检测到这样的模式,则返回null,并且扫描器的位置保持不变。此方法可能会阻塞,等待匹配模式的输入。

你所观察到的是findInLine在命中换行符之前未找到与指定模式匹配的内容,因此返回null,并且不改变位置。

也许findWithinHorizon(pattern, 0)更符合你的喜好?如果需要的话,它将一直查找(直到输入结束),直到找到与你的正则表达式匹配的内容。

然后它返回匹配项。如果你需要整行内容,只需扩展你的正则表达式:"^.*current time = (.*)$"将始终匹配整行。

第二个提示:你的异常处理真的很糟糕。如果捕获到异常,应该处理它。"打印一些文本然后继续,好像什么都没发生"并不是处理异常。一个简单的解决方案是在你的主方法上添加throws Exception(这在任何情况下几乎都是一个好主意)。然后只需删除你的代码中的每个try{catch{}块。这样做会使代码更短、更容易阅读,而且更好!

英文:

From the docs of Scanner's findInLine:

> Attempts to find the next occurrence of the specified pattern ignoring delimiters. If the pattern is found before the next line separator, the scanner advances past the input that matched and returns the string that matched the pattern. If no such pattern is detected in the input up to the next line separator, then null is returned and the scanner's position is unchanged. This method may block waiting for input that matches the pattern.

What you're observing is findInLine not finding anything matching the pattern specified before hitting a newline, and thus returning null and not changing the position whatsoever.

Perhaps findWithinHorizon(pattern, 0) is more to your liking? This will keep looking, forever (until end of input) if need be, until it finds a match on your regexp.

It then returns the match. If you need the entire line, just expand on your regexp: "^.*current time = (.*)$" would always match an entire line.

A second tip: your exception handling is atrocious. if you catch an exception, handle it. 'print some text and carry right on as if nothing is wrong' is not handling it. Trivial solution: add throws Exception onto your main method (which is almost always a good idea in any case). Then just.. get rid of every try{ and catch{} block in your code. Makes it way shorter and easier to read, and better to boot!

huangapple
  • 本文由 发表于 2020年7月30日 22:22:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/63175224.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定