英文:
how can i get spesific words from an url in java
问题
如何在Java中从URL获取特定单词。就像我想从调用类似blablabla的类中获取数据一样。
这是我的代码。
URL url = new URL("https://www.doviz.com/");
URLConnection connect = url.openConnection();
InputStream is = connect.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while((line = br.readLine()) != null)
{
System.out.println(line);
}
英文:
How can i get spesific words from an url in java. Like i want to take datas from class which calling like blablabla.
Here is my code.
URL url = new URL("https://www.doviz.com/");
URLConnection connect = url.openConnection();
InputStream is = connect.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while((line = br.readLine()) != null)
{
System.out.println(line);
}
答案1
得分: 1
请看一下 Jsoup,这将允许您获取网页的内容而不是HTML代码。可以说它会扮演浏览器的角色,它会将HTML标签解析为人类可读的文本。
一旦您在字符串中获取了页面的内容,您可以使用任何算法来计算单词的出现次数。
使用它的简单示例:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/* ........ */
String URL = "https://www.doviz.com/";
Document doc = Jsoup.connect(URL).get();
String text = doc.body().text();
System.out.println(text);
编辑
如果您不想使用解析器(正如您在评论中提到您不想使用外部库),您将获得页面的整个HTML代码,以下是如何做到这一点:
try {
URL url = new URL("https://www.doviz.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
str = in.readLine().toString();
System.out.println(str);
/* str每次获取新行,如果您想将整个文本存储在str中,可以使用连接操作(str += in.readLine().toString()) */
}
in.close();
} catch (Exception e) {}
英文:
Take a look at Jsoup , this will allow you to get the content of a web page and NOT the HTML code. Let's say it will play the role of the browser, it will parse the HTML tags into a human readable text.
Once you will get the content of your page in a String, you can count the occurrences of your word using any algorithm of occurrences count.
Simple example to use it:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/* ........ */
String URL = "https://www.doviz.com/";
Document doc = Jsoup.connect(URL).get();
String text = doc.body().text();
System.out.println(text);
EDIT
If you don't want to use a parser (as you mentioned in the comment that you don't want external libraries), you will get the whole HTML code of the page, that's how you can do it
try {
URL url = new URL("https://www.doviz.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
str = in.readLine().toString();
System.out.println(str);
/*str will get each time the new line, if you want to store the whole text in str
you can use concatenation (str+ = in.readLine().toString())*/
}
in.close();
} catch (Exception e) {}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论