How to get html code while hitting one website using spring boot and store this whole HTML data in one string variable?

huangapple go评论210阅读模式
英文:

How to get html code while hitting one website using spring boot and store this whole HTML data in one string variable?

问题

我试图找一些关于如何使用Spring Boot在访问任何网站时获取HTML数据的资料,但是我没有找到任何最佳示例。有谁可以帮助我提供解决方案吗?

英文:

I tried to find some stuffs regarding how to get HTML data while hitting any of the website using spring boot but I didn't get any of the best example stuffs.Can anyone help me to give solution for this?

答案1

得分: 0

你可以使用HTML解析器,例如JSoup来完成这个任务。

演示:

  1. import java.io.IOException;
  2. import org.jsoup.Jsoup;
  3. public class JSoupDemo {
  4. public static void main(String[] args) throws IOException {
  5. String webPage = "http://www.example.com";
  6. String html = Jsoup.connect(webPage).get().html();
  7. System.out.println(html);
  8. }
  9. }

输出:

  1. <!doctype html>
  2. <html>
  3. <head>
  4. <title>Example Domain</title>
  5. <meta charset="utf-8">
  6. <meta http-equiv="Content-type" content="text/html; charset=utf-8">
  7. <meta name="viewport" content="width=device-width, initial-scale=1">
  8. <style type="text/css">
  9. body {
  10. background-color: #f0f0f2;
  11. margin: 0;
  12. padding: 0;
  13. font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
  14. }
  15. div {
  16. width: 600px;
  17. margin: 5em auto;
  18. padding: 2em;
  19. background-color: #fdfdff;
  20. border-radius: 0.5em;
  21. box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
  22. }
  23. a:link, a:visited {
  24. color: #38488f;
  25. text-decoration: none;
  26. }
  27. @media (max-width: 700px) {
  28. div {
  29. margin: 0 auto;
  30. width: auto;
  31. }
  32. }
  33. </style>
  34. </head>
  35. <body>
  36. <div>
  37. <h1>Example Domain</h1>
  38. <p>This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.</p>
  39. <p><a href="https://www.iana.org/domains/example">More information...</a></p>
  40. </div>
  41. </body>
  42. </html>

**或者,**你也可以使用java.io.BufferedReader来完成,如下所示:

  1. import java.io.BufferedReader;
  2. import java.io.IOException;
  3. import java.io.InputStreamReader;
  4. import java.net.MalformedURLException;
  5. import java.net.URL;
  6. public class Main {
  7. public static void main(String[] args) {
  8. try (BufferedReader br = new BufferedReader(
  9. new InputStreamReader(new URL("http://www.example.com").openStream()))) {
  10. String line;
  11. StringBuilder sb = new StringBuilder();
  12. while ((line = br.readLine()) != null) {
  13. sb.append(line);
  14. sb.append(System.lineSeparator());
  15. }
  16. System.out.println(sb);
  17. } catch (MalformedURLException e) {
  18. e.printStackTrace();
  19. } catch (IOException e) {
  20. e.printStackTrace();
  21. }
  22. }
  23. }
英文:

You can use an HTML parser e.g. JSoup to do it.

Demo:

  1. import java.io.IOException;
  2. import org.jsoup.Jsoup;
  3. public class JSoupDemo {
  4. public static void main(String[] args) throws IOException {
  5. String webPage = &quot;http://www.example.com&quot;;
  6. String html = Jsoup.connect(webPage).get().html();
  7. System.out.println(html);
  8. }
  9. }

Output:

  1. &lt;!doctype html&gt;
  2. &lt;html&gt;
  3. &lt;head&gt;
  4. &lt;title&gt;Example Domain&lt;/title&gt;
  5. &lt;meta charset=&quot;utf-8&quot;&gt;
  6. &lt;meta http-equiv=&quot;Content-type&quot; content=&quot;text/html; charset=utf-8&quot;&gt;
  7. &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1&quot;&gt;
  8. &lt;style type=&quot;text/css&quot;&gt;
  9. body {
  10. background-color: #f0f0f2;
  11. margin: 0;
  12. padding: 0;
  13. font-family: -apple-system, system-ui, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Open Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif;
  14. }
  15. div {
  16. width: 600px;
  17. margin: 5em auto;
  18. padding: 2em;
  19. background-color: #fdfdff;
  20. border-radius: 0.5em;
  21. box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
  22. }
  23. a:link, a:visited {
  24. color: #38488f;
  25. text-decoration: none;
  26. }
  27. @media (max-width: 700px) {
  28. div {
  29. margin: 0 auto;
  30. width: auto;
  31. }
  32. }
  33. &lt;/style&gt;
  34. &lt;/head&gt;
  35. &lt;body&gt;
  36. &lt;div&gt;
  37. &lt;h1&gt;Example Domain&lt;/h1&gt;
  38. &lt;p&gt;This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.&lt;/p&gt;
  39. &lt;p&gt;&lt;a href=&quot;https://www.iana.org/domains/example&quot;&gt;More information...&lt;/a&gt;&lt;/p&gt;
  40. &lt;/div&gt;
  41. &lt;/body&gt;
  42. &lt;/html&gt;

Alternatively, you can do it using java.io.BufferedReader as follows:

  1. import java.io.BufferedReader;
  2. import java.io.IOException;
  3. import java.io.InputStreamReader;
  4. import java.net.MalformedURLException;
  5. import java.net.URL;
  6. public class Main {
  7. public static void main(String[] args) {
  8. try (BufferedReader br = new BufferedReader(
  9. new InputStreamReader(new URL(&quot;http://www.example.com&quot;).openStream()))) {
  10. String line;
  11. StringBuilder sb = new StringBuilder();
  12. while ((line = br.readLine()) != null) {
  13. sb.append(line);
  14. sb.append(System.lineSeparator());
  15. }
  16. System.out.println(sb);
  17. } catch (MalformedURLException e) {
  18. e.printStackTrace();
  19. } catch (IOException e) {
  20. e.printStackTrace();
  21. }
  22. }
  23. }

huangapple
  • 本文由 发表于 2020年4月9日 23:09:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/61124319.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定