获取到 403 的响应代码,尽管在浏览器上网站正常加载。

huangapple go评论118阅读模式
英文:

Getting 403 response code even though the site is loading properly on browser

问题

以下是翻译好的内容:

我正在尝试使用HttpURLConnection获取响应代码,但是响应代码是"403",即使在浏览器上网站也能正常加载。

网址:"https://www.texanscu.org/home/home"

以下是我使用的代码:

try {
     String url = "https://www.texanscu.org/home/home";

     HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();

     conn.setRequestMethod("GET");
     conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
     conn.setConnectTimeout(2000);
     conn.setInstanceFollowRedirects(false);
     conn.setReadTimeout(100000);
     conn.connect();

     int responseCode = conn.getResponseCode();

} catch(Exception e) {
     logger.error("捕获异常 : {}", e.getMessage);
}

没有异常。只是我得到了响应代码"403"。

英文:

I am trying to get response code using httpurlconnection but getting "403" as response code even though the site is properly oading on browser.

URL : "https://www.texanscu.org/home/home"

Here is the code I am using,

try {
     String url = "https://www.texanscu.org/home/home";

     HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();

     conn.setRequestMethod("GET");
     conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
     conn.setConnectTimeout(2000);
     conn.setInstanceFollowRedirects(false);
     conn.setReadTimeout(100000);
     conn.connect();

     int responseCode = conn.getResponseCode();

    } catch(Exception e) {
         logger.error("Caught exception : {}", e.getMessage);
    }

There is no exception. It's just that I am getting the response code as "403".

答案1

得分: 1

我稍微修改了你的代码,以便查看从服务器返回的实际响应。

try {
    String url = "https://www.texanscu.org/";

    HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();

    conn.setRequestMethod("GET");
    conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36");
    conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
    conn.setConnectTimeout(100000);
    conn.setInstanceFollowRedirects(false);
    conn.setReadTimeout(100000);
    conn.connect();

    int responseCode = conn.getResponseCode();
    System.out.println(responseCode);
    BufferedReader br = new BufferedReader(new InputStreamReader((conn.getErrorStream())));
    String strCurrentLine;
    while ((strCurrentLine = br.readLine()) != null) {
        System.out.println(strCurrentLine);
    }

} catch(Exception e) {
    e.printStackTrace();
}

当我执行这段代码时,我看到以下输出:

<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!-- ... (more HTML content) ... -->
</head>
<body>
  <!-- ... (more HTML content) ... -->
</body>
</html>

Cloudflare 正在保护你尝试访问的网站,因此他们在服务器上使用了反 DoS 系统。“DoS” 表示拒绝服务攻击。例如,有人配置了成千上万台计算机来访问一个网站,试图使其过载。这些自动化攻击是由机器人或简称为“机器人”执行的。
显然,系统认为你是一个机器人。因此,你无法通过这段 Java 代码访问此端点。

编辑:我找到了这个库:
https://github.com/iambluedev1/cfscrape-java
它用于绕过 Cloudflare 的反机器人页面。你可以尝试使用它。

英文:

I modifed your code a little bit, to see the actual response from server.

 try {
            String url = &quot;https://www.texanscu.org/&quot;;

            HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();

            conn.setRequestMethod(&quot;GET&quot;);
            conn.setRequestProperty(&quot;User-Agent&quot;, &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36&quot;);
            conn.setRequestProperty(&quot;Accept&quot;, &quot;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&quot;);
            conn.setConnectTimeout(100000);
            conn.setInstanceFollowRedirects(false);
            conn.setReadTimeout(100000);
            conn.connect();

            int responseCode = conn.getResponseCode();
            System.out.println(responseCode);
            BufferedReader br = new BufferedReader(new InputStreamReader((conn.getErrorStream())));
            String strCurrentLine;
            while ((strCurrentLine = br.readLine()) != null) {
                System.out.println(strCurrentLine);
            }


        } catch(Exception e) {
            e.printStackTrace();
        }

So when I execute this code, I see following output :

&lt;head&gt;
&lt;title&gt;Attention Required! | Cloudflare&lt;/title&gt;
&lt;meta name=&quot;captcha-bypass&quot; id=&quot;captcha-bypass&quot; /&gt;
&lt;meta charset=&quot;UTF-8&quot; /&gt;
&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8&quot; /&gt;
&lt;meta http-equiv=&quot;X-UA-Compatible&quot; content=&quot;IE=Edge,chrome=1&quot; /&gt;
&lt;meta name=&quot;robots&quot; content=&quot;noindex, nofollow&quot; /&gt;
&lt;meta name=&quot;viewport&quot; content=&quot;width=device-width,initial-scale=1&quot; /&gt;
&lt;link rel=&quot;stylesheet&quot; id=&quot;cf_styles-css&quot; href=&quot;/cdn-cgi/styles/cf.errors.css&quot; type=&quot;text/css&quot; media=&quot;screen,projection&quot; /&gt;
&lt;!--[if lt IE 9]&gt;&lt;link rel=&quot;stylesheet&quot; id=&#39;cf_styles-ie-css&#39; href=&quot;/cdn-cgi/styles/cf.errors.ie.css&quot; type=&quot;text/css&quot; media=&quot;screen,projection&quot; /&gt;&lt;![endif]--&gt;
&lt;style type=&quot;text/css&quot;&gt;body{margin:0;padding:0}&lt;/style&gt;
&lt;!--[if gte IE 10]&gt;&lt;!--&gt;&lt;script type=&quot;text/javascript&quot; src=&quot;/cdn-cgi/scripts/zepto.min.js&quot;&gt;&lt;/script&gt;&lt;!--&lt;![endif]--&gt;
&lt;!--[if gte IE 10]&gt;&lt;!--&gt;&lt;script type=&quot;text/javascript&quot; src=&quot;/cdn-cgi/scripts/cf.common.js&quot;&gt;&lt;/script&gt;&lt;!--&lt;![endif]--&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;div id=&quot;cf-wrapper&quot;&gt;
&lt;div class=&quot;cf-alert cf-alert-error cf-cookie-error&quot; id=&quot;cookie-alert&quot; data-translate=&quot;enable_cookies&quot;&gt;Please enable cookies.&lt;/div&gt;
&lt;div id=&quot;cf-error-details&quot; class=&quot;cf-error-details-wrapper&quot;&gt;
&lt;div class=&quot;cf-wrapper cf-header cf-error-overview&quot;&gt;
&lt;h1 data-translate=&quot;challenge_headline&quot;&gt;One more step&lt;/h1&gt;
&lt;h2 class=&quot;cf-subheadline&quot;&gt;&lt;span data-translate=&quot;complete_sec_check&quot;&gt;Please complete the security check to access&lt;/span&gt; www.texanscu.org&lt;/h2&gt;
&lt;/div&gt;&lt;!-- /.header --&gt;
&lt;div class=&quot;cf-section cf-highlight cf-captcha-container&quot;&gt;
&lt;div class=&quot;cf-wrapper&quot;&gt;
&lt;div class=&quot;cf-columns two&quot;&gt;
&lt;div class=&quot;cf-column&quot;&gt;
&lt;div class=&quot;cf-highlight-inverse cf-form-stacked&quot;&gt;
&lt;form class=&quot;challenge-form&quot; id=&quot;challenge-form&quot; action=&quot;/?__cf_chl_captcha_tk__=8f811f0d4e8be53ef88568630d8c627b6a8639a6-1598364414-0-AXcy5nmycYBIOZVpr4NiQNNpsvz-TeYA4kD6NYOYQq8A9OjdxedaBdcfaEp4DM-P6EnhMFryAMIv8_Vi3PM3ukkKw8W4aFv0W4FXhYk4eJgcmPWlA6XdiAQBfIRWqmV7ORtKClPdGo9CgujUYWdpkGr_3hGiUU_bLFp9jf8mF-nCM3s9nex_0MiA916wQSCs-nhaM8_jFGdJ2VmJjczihFz8MFed_zVHNzLG4HHQdcrOl13P4jZy9Y_nhJfAyhVG0ngQXE8y-Slb_c5gvcfGGDa8vrxTpLgrQEF2-SwqkjhJTApfSUn6Y3mtjZ9ZYrA28NDZC1ngeit3IOga5pxB2wcZuYHfatTHy832J_itWa8MrtbDQV_DSWwGroAdC9q10MVYI0CIPzcxzvOrWSUYjlPYRxEKE_cw1mvO5hxsQuPtSlHIMs0bIHUpZl88F16Ki1xr8FEgqGM8aU2-VFlzYjKHh89qHe1MoapqHmZ31Na5Q0LAbGJdl69lGFGhUczHqWL9D015U4Jfpmim3203E23qb5vLnzBu8kJf6ygKDvKn&quot; method=&quot;POST&quot; enctype=&quot;application/x-www-form-urlencoded&quot;&gt;
&lt;input type=&quot;hidden&quot; name=&quot;cf_captcha_kind&quot; value=&quot;h&quot;&gt;
&lt;input type=&quot;hidden&quot; name=&quot;vc&quot; value=&quot;&quot;&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;/cdn-cgi/scripts/hcaptcha.challenge.js&quot; data-type=&quot;normal&quot;  data-ray=&quot;5c85e0576bec0faa&quot; async data-sitekey=&quot;33f96e6a-38cd-421b-bb68-7806e1764460&quot;&gt;&lt;/script&gt;
&lt;noscript id=&quot;cf-captcha-bookmark&quot; class=&quot;cf-captcha-info&quot;&gt;
&lt;h1 data-translate=&quot;turn_on_js&quot; style=&quot;color:#bd2426;&quot;&gt;Please turn JavaScript on and reload the page.&lt;/h1&gt;
&lt;/noscript&gt;
&lt;div id=&quot;no-cookie-warning&quot; data-translate=&quot;turn_on_cookies&quot; style=&quot;display:none&quot;&gt;
&lt;h1 data-translate=&quot;turn_on_cookies&quot; style=&quot;color:#bd2426;&quot;&gt;Please enable Cookies.&lt;/h1&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
//&lt;![CDATA[
var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
b = function(b, c) {a() ? document.addEventListener(&quot;DOMContentLoaded&quot;, b, c) : document.attachEvent(&quot;onreadystatechange&quot;, b)};
b(function(){
var cookiesEnabled=(navigator.cookieEnabled)? true : false;
if(!cookiesEnabled){
var q = document.getElementById(&#39;no-cookie-warning&#39;);q.style.display = &#39;block&#39;;
}
});
//]]&gt;
&lt;/script&gt;
&lt;div id=&quot;trk_captcha_js&quot; style=&quot;background-image:url(&#39;/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=5c85e0576bec0faa&#39;)&quot;&gt;&lt;/div&gt;
&lt;/form&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cf-column&quot;&gt;
&lt;div class=&quot;cf-screenshot-container&quot;&gt;
&lt;span class=&quot;cf-no-screenshot&quot;&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;!-- /.columns --&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;!-- /.captcha-container --&gt;
&lt;div class=&quot;cf-section cf-wrapper&quot;&gt;
&lt;div class=&quot;cf-columns two&quot;&gt;
&lt;div class=&quot;cf-column&quot;&gt;
&lt;h2 data-translate=&quot;why_captcha_headline&quot;&gt;Why do I have to complete a CAPTCHA?&lt;/h2&gt;
&lt;p data-translate=&quot;why_captcha_detail&quot;&gt;Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;cf-column&quot;&gt;
&lt;h2 data-translate=&quot;resolve_captcha_headline&quot;&gt;What can I do to prevent this in the future?&lt;/h2&gt;
&lt;p data-translate=&quot;resolve_captcha_antivirus&quot;&gt;If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.&lt;/p&gt;
&lt;p data-translate=&quot;resolve_captcha_network&quot;&gt;If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.&lt;/p&gt;
&lt;p data-translate=&quot;resolve_captcha_privacy_pass&quot;&gt; Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the &lt;a href=&quot;https://chrome.google.com/webstore/detail/privacy-pass/ajhmfdgkijocedmfjonnpjfojldioehi&quot;&gt;Chrome Web Store&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;!-- /.section --&gt;
&lt;div class=&quot;cf-error-footer cf-wrapper&quot;&gt;
&lt;p&gt;
&lt;span class=&quot;cf-footer-item&quot;&gt;Cloudflare Ray ID: &lt;strong&gt;5c85e0576bec0faa&lt;/strong&gt;&lt;/span&gt;
&lt;span class=&quot;cf-footer-separator&quot;&gt;&amp;bull;&lt;/span&gt;
&lt;span class=&quot;cf-footer-item&quot;&gt;&lt;span&gt;Your IP&lt;/span&gt;: 178.221.185.37&lt;/span&gt;
&lt;span class=&quot;cf-footer-separator&quot;&gt;&amp;bull;&lt;/span&gt;
&lt;span class=&quot;cf-footer-item&quot;&gt;&lt;span&gt;Performance &amp;amp; security by&lt;/span&gt; &lt;a href=&quot;https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer&quot; id=&quot;brand_link&quot; target=&quot;_blank&quot;&gt;Cloudflare&lt;/a&gt;&lt;/span&gt;
&lt;/p&gt;
&lt;/div&gt;&lt;!-- /.error-footer --&gt;
&lt;/div&gt;&lt;!-- /#cf-error-details --&gt;
&lt;/div&gt;&lt;!-- /#cf-wrapper --&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
window._cf_translation = {};
&lt;/script&gt;
&lt;/body&gt;
&lt;/html&gt;

Cloudflare is protecting the website you are trying to reach,so they are using anti-DoS system on server."DoS" means a Denial of Service attack. For example, someone configures thousands of machines to hit a website in an attempt to overload it. These automated attacks are done by robots or simply "bots".
Obviously, the system thinks you are a bot. So, there is no way you can reach this endpoint trought java code.

EDIT : If found this library :
https://github.com/iambluedev1/cfscrape-java
It's used to bypass Cloudflare's anti-bot page. You can try it.

huangapple
  • 本文由 发表于 2020年8月25日 19:40:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/63578062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定