使用流来从文件中解析字符串

huangapple go评论77阅读模式
英文:

Using streams for parsing string from file

问题

public class FileParser {
    Stream<String> outputStream;
    
    public FileParser(String fileName) throws IOException, URISyntaxException {
        FileReader fr = new FileReader();
        this.outputStream = fr.getStreamFromFile(fileName);
    }
    
    public List<HashMap<String, ArrayList<String>>> getRacersInfo() {
        return outputStream.map(line -> {
            String key = line.substring(0, 3);
            String date = line.substring(3, 13);
            String time = line.substring(14);
            
            HashMap<String, ArrayList<String>> map = new HashMap<>();
            ArrayList<String> list = new ArrayList<>();
            list.add(date);
            list.add(time);
            map.put(key, list);
            
            return map;
        }).collect(Collectors.toList());
    }
}
英文:

I learn how to use java 8 API. I have a simple log file with the following contents:

SVF2018-05-24_12:02:58.917
NHR2018-05-24_12:02:49.914
FAM2018-05-24_12:13:04.512
KRF2018-05-24_12:03:01.250
SVM2018-05-24_12:18:37.735
MES2018-05-24_12:04:45.513
LSW2018-05-24_12:06:13.511
BHS2018-05-24_12:14:51.985
EOF2018-05-24_12:17:58.810
RGH2018-05-24_12:05:14.511
SSW2018-05-24_12:16:11.648
KMH2018-05-24_12:02:51.003
PGS2018-05-24_12:07:23.645
CSR2018-05-24_12:03:15.145
SPF2018-05-24_12:12:01.035
DRR2018-05-24_12:14:12.054
LHM2018-05-24_12:18:20.125
CLS2018-05-24_12:09:41.921
VBM2018-05-24_12:00:00.000

My goal is to parse it using streams. The desired output is the following:

[{SVF = [2018-05-24, 12:02:58.917]}, {NHR = [2018-05-24, 12:02:49.914]}...]

I already have the following:

public class FileParser {
	Stream&lt;String&gt; outputStream;
	public FileParser(String fileName) throws IOException, URISyntaxException {
		FileReader fr = new FileReader();
		this.outputStream = fr.getStreamFromFile(fileName);

    public List&lt;HashMap&lt;String,ArrayList&lt;String&gt;&gt;&gt; getRacersInfo(){
	    return outputStream.map(line -&gt; Arrays.asList(line.substring(0,3))
			    .collect(Collectors.toMap(???)); //Some code here which I cannot come up with.
    }

Any help appreciated. If you need any additional information feel free to ask, I'll be glad to provide it.

答案1

得分: 4

Problem: FileReader

FileReader已过时,不要使用它。它是一个过时的API,存在问题,因为它假设了“平台默认编码”,这又是另一种说“一个等待发生的错误,没有测试会捕捉到,但将在后面给你带来麻烦”的方式。你绝不想要“平台默认编码”,尤其不要作为一个默认的静默选择。

有一个新的文件API,它允许你明确指定编码。而且,在新的文件API中,如果你不指定,UTF-8会被假定为默认值,这比“平台默认”要好得多。

Problem: 资源

资源是表示占用操作系统级句柄的资源的对象。文件、网络连接、数据库连接 - 这些都是资源的常见示例。与普通对象不同,你必须显式地关闭这些资源,如果不关闭,你的虚拟机最终会崩溃。这意味着你基本上不能将读取器/输入流/输出流/写入器放在字段中,因为你如何保证关闭它们呢?唯一的方法是将你自己的类也作为资源(必须显式关闭的东西),你可以做到这一点,但在这里不是一个好主意。

除非你安全地创建资源,否则永远不要创建资源:

不好的写法:

FileReader fr = new FileReader(..);

好的写法:

try (FileReader fr = new FileReader(..)) {
    // 在这里使用
}
// 到了这里,它已经关闭了

这确实需要你稍微调整一下编码风格。你必须打开资源,使用资源,然后关闭它。这与实际情况很相符:资源会耗尽操作系统资源,你不希望保持它们打开的时间比必要的时间更长,因此“打开它,使用它,然后丢弃它”是正确的心态。

此外,当然,资源作为概念通常是“一次通过”。例如,当读取文件时,你会读取它,从头到尾读取一次,然后任何进一步尝试从中读取的操作都不再起作用。因此,在你的示例中,第一次调用getRacersInfo()时,它会工作。但第二次调用它时,由于读取器已被使用,它将不起作用。

解决这两个问题的方法是在构造函数中进行阅读。

Problem: 构造函数的职责误解

这个类被称为FileParser。所以,它的工作是解析文件(如果不是这样,这个类的命名不好)。通常情况下,你的构造函数表示“数据收集”阶段,而不是“执行工作”的阶段。因此,在构造函数中首先解析文件是不好的代码风格。你不应该这样做 - 作为一个规则,你的构造函数应该尽可能地做得少,绝对不应该做一些棘手的事情,比如打开文件或实际解析内容。再次强调 - FileParser的工作是解析文件,构造函数不应该执行这个工作。它们只是设置对象,以便稍后可以执行这项工作。

正确的设计是:

public class FileParser {
    private final Path path;

    public FileParser(Path path) {
        this.path = path;
    }

    public List<Map<String, List<String>>> parseRacersInfo() {
        try (Stream<String> lines = Files.lines(path)) {
            lines.map(.... 在这里解析内容 ....);
        }
    }
}

现在我们已经:

  1. 将“工作”部分移动到准确描述工作的方法中。
  2. 确保构造函数简单,并且只是收集执行工作所需的信息。
  3. 通过应用try(){}的概念来安全地使用资源。
  4. 使用新的API(java.nio.file.Filesjava.nio.file.Path)。
  5. 明确了我们的类型:构造函数的参数表示路径。如果我调用new FileParser("Hello, IceTea, how's life?") - 这个调用没有意义。PathString更具描述性,如果你的方法在仅查看参数类型的情况下就有意义?那比需要阅读文档更好。

Problem: 不按照Java想要的方式使用Java

Java是一种强类型语言。名义上是如此。事物应该存储在代表该事物的类型中。因此,字符串2018-05-24_12:18:20.125应该由表示某种时间的对象表示。而不是包含字符串2018-05-2412:18:20.125List<String>

最后:如何实际编写映射?

流通过将流中的单个元素放大,并对这些元素执行一系列操作,进行转换、过滤等。在过程中你不能“回到”(一旦将一个事物映射为另一个事物,你不能回到原来的状态),你也不能引用流中的其他对象(你不能询问:给我流中在我前面的项)。因此,一旦你执行line.substring(0, 3),你就扔掉了日期,而这是个问题,因为我们需要这个信息。因此,在.map()操作中不能这样做。

实际上,我们可以直接将流收集回映射中 - 我们需要整个字符串,并且我们可以从中推断出键(SVF),我们需要整个字符串,并且我们可以从中推断出值

英文:

Problem: FileReader

FileReader is obsolete, don't use it. It's outdated API, and it's problematic, in that it presumes 'platform default encoding' which is a different way of saying 'a bug waiting to happen that no test will catch but that will blow up in your face later'. You never want 'platform default encoding', especially as a silent default.

There's a new File API, and it lets you specify encoding explicitly. Also, in the new File API, if you don't, UTF-8 is assumed which is a far saner default than 'platform default'.

Problem: resources

Resources are objects that represent a resource that takes up OS-level handles. Files, network connections, database connections - those are some common examples of resources. The thing is unlike normal objects, you MUST explicitly CLOSE those. - if you don't, your VM will, eventually, crash. That means you can basically not put readers/inputstreams/outputstreams/writers in fields, ever, because how do you guarantee closing them? The only way is to make your own class a resource too (a thing that must explicitly be closed), which you can do, but is complicated, and not a good idea here.

You should never make resources unless you do so safely:

bad:

FileReader fr = new FileReader(..);

good:

try (FileReader fr = new FileReader(..)) {
    // use here
}
// it&#39;s gone here

It does require you to restyle things a bit. You have to open a resource, use the resource, and close it. This meshes well with pragmatic concerns: Resources are a drain on the OS, you don't want to keep em open any longer than you must, so 'open it, use it, and lose it' is the right mindset.

Furthermore, of course, resources as a concept are generally 'once-through-only'. for example, when reading a file, well, you read it, once, from the top to the bottom, and then any further attempts to read from it don't work anymore. So, in your example, the first time I call getRacersInfo(), it works. But the second time I call it, it won't, as the reader has now been consumed.

The solution to both problems is to do the reading in the constructor*.

*) See later - we're going to move this out of the constructor eventually, but that's a separate concern.

Problem: Misunderstanding of responsibilities of constructors

This class is called a FileParser. So, it's job is to parse files (that, or, this class has a bad name). Generally, your constructors represent the 'data gathering' phase, not the 'do the job' phase. Therefore, parsing the file in the first place, in that constructor, is bad code style. You should not do this - your constructors should as a rule do as little as possible and definitely nothing tricky, such as opening files or actually parsing things. Again - the JOB of a FileParser is to parse files, and constructors should not do the job. They just set up the object so that it can do the job later.

The proper design, then, is:

public class FileParser {
    private final Path path;

    public FileParser(Path path) {
        this.path = path;
    }

    public List&lt;Map&lt;String, List&lt;String&gt;&gt; parseRacersInfo() {
        try (Stream&lt;String&gt; lines = Files.lines(path)) {
            lines.map(.... parse the content here ....);
        }
    }
}

We have now:

  1. Moved the 'job' part to a method that accurately describes the job.
  2. Ensured the constructor is simple and just gathers information to do the job.
  3. Safely use resources by applying the try(){} concept.
  4. Use the new API (java.nio.file.Files and java.nio.file.Path).
  5. Clarified our typing: That parameter to the constructor represents a path. If I call new FileParser(&quot;Hello, IceTea, how&#39;s life?&quot;) - that call makes no sense. Path is more descriptive than String, and if your method makes sense looking only at the types of the parameters? That's better than if you need to read the docs too.

Problem: Not using java the way it wants to be used

Java is typed. Nominally so. Things should be stored in types that represent that thing. Thus, the string 2018-05-24_12:18:20.125 should be represented by an object that represents a time of some sort. Not a List&lt;String&gt; containing the string 2018-05-24 and 12:18:20.125.

Finally: How do I actually write the mapping?

Streams work by zooming in on a single element in the stream, and doing a series of operations on these elements, transforming them, filtering some out, etcetera. You cannot 'go back' in the process (once you map a thing to another thing, you can't go back to what it used to be), and you can't refer to other objects in your stream (you can't ask: Give me the item before me in the stream).

Thus, once you go: line.substring(0, 3), you've thrown out the date, and that's a problem because we need that info. Therefore, you can't do that; not in a .map() operation, at any rate.

In fact, we can go straight to collecting the stream back into a map here - we need that entire string and we can derive the key from it (SVF), and we need that entire string and we can derive the value from it (the date).

Let's write these conversion functions, and let's translate our string representing a time to a proper (also new in java 8) type for it: java.time.LocalDateTime:

Function&lt;String, String&gt; toKey = in -&gt; in.substring(0, 3);

DateTimeFormatter DATETIME_FORMAT =
  DateTimeFormatter.ofPattern(&quot;uuuu-MM-dd_HH:mm:ss.SSS&quot;, Locale.ENGLISH);
Function&lt;String, LocalDateTime&gt; toValue = in -&gt; 
    LocalDateTime.parse(in.substring(3), DATETIME_FORMAT);

These are simple and we can test them:

assertEquals(&quot;VBM&quot;, toKey.apply(&quot;VBM2018-05-24_12:00:00.000&quot;));
assertEquals(LocalDateTime.of(2018, 5, 24, 12, 0, 0),
    toValue.apply(&quot;VBM2018-05-24_12:00:00.000&quot;));

Then we put it all together:

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.io.IOException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileParser {
    private static final DateTimeFormatter DATETIME_FORMAT =
        DateTimeFormatter.ofPattern(&quot;uuuu-MM-dd_HH:mm:ss.SSS&quot;, Locale.ENGLISH);

    private final Path path;

    public FileParser(Path path) {
        this.path = path;
    }

    public Map&lt;String, LocalDateTime&gt; parseRacersInfo() throws IOException {
        try (Stream&lt;String&gt; lines = Files.lines(path)) {
            return lines.collect(Collectors.toMap(
                in -&gt; in.substring(0, 3),
                in -&gt; LocalDateTime.parse(in.substring(3), DATETIME_FORMAT)));
        }
    }

    public static void main(String[] args) throws Exception {
        System.out.println(new FileParser(&quot;test.txt&quot;).parseRacersInfo());
    }
}

答案2

得分: 1

Path path = Paths.get(fileName);
try (Stream<String> lines = Files.lines(path)) {
    Map<String, List<LocalDateTime>> map = lines.collect(Collectors.groupingBy(
        line -> line.substring(0, 3),
        line -> LocalDateTime.parse(line.substring(3).replace('_', 'T')))
    );
}

The `toMap` receives a key mapper and a value mapper. Here I keep the Stream of lines.

The resulting map is just a `Map`. Never provide an implementation, `HashMap`, so the `collect` may return its own implementation. (In effect, you could provide an implementation.)

(I used `Files.lines` which defaults to UTF-8 encoding, but you can add an encoding. The reason: `Path` is more generalized than `File`.)
英文:
Path path = Paths.get(fileName);
try (Stream&lt;String&gt; lines = Files.lines(path)) {
    Map&lt;String, List&lt;LocalDateTime&gt;&gt; map = lines.collect(Collectors.groupingBy(
        line -&gt; line.substring(0, 3),
        line -&gt; LocalDateTime.parse(line.substring(3).replace(&#39;_&#39;, &#39;T&#39;)));
}

The toMap receives a key mapper and a value mapper. Here I keep the Stream of lines.

The resulting map is just a Map. Never provide an implementation, HashMap so the collect may return its own implementation. (If effect you could provide an implementation.)

(I used Files.lines which defaults to UTF-8 encoding, but you can add an encoding. The reason: Path is more generalized than File.)

答案3

得分: 1

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Test {

    public static void main(String[] args) {
        String fileName = "C:\\Users\\Asmir\\Desktop\\input1.txt";
        Map<String, List<String>> map = new HashMap<>();

        try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
            map = stream
                    .collect(Collectors.toMap(s -> s.substring(0, 3), s -> Arrays.asList(s.substring(3).split("_"))));

        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println(map);
    }
}
英文:

Something like :

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Test {

    public static void main(String[] args) {
        String fileName = &quot;C:\\Users\\Asmir\\Desktop\\input1.txt&quot;;
        Map&lt;String,List&lt;String&gt;&gt; map = new HashMap&lt;&gt;();

        try (Stream&lt;String&gt; stream = Files.lines(Paths.get(fileName))) {
            map = stream
                    .collect(Collectors.toMap(s -&gt; s.substring(0,3), s -&gt; Arrays.asList(s.substring(3).split(&quot;_&quot;))));

        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println(map);
    }
}

huangapple
  • 本文由 发表于 2020年10月9日 21:35:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/64281185.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定