英文:
Getting 403 response code even though the site is loading properly on browser
问题
以下是翻译好的内容:
我正在尝试使用HttpURLConnection
获取响应代码,但是响应代码是"403",即使在浏览器上网站也能正常加载。
网址:"https://www.texanscu.org/home/home"
以下是我使用的代码:
try {
String url = "https://www.texanscu.org/home/home";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
} catch(Exception e) {
logger.error("捕获异常 : {}", e.getMessage);
}
没有异常。只是我得到了响应代码"403"。
英文:
I am trying to get response code using httpurlconnection but getting "403" as response code even though the site is properly oading on browser.
URL : "https://www.texanscu.org/home/home"
Here is the code I am using,
try {
String url = "https://www.texanscu.org/home/home";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
} catch(Exception e) {
logger.error("Caught exception : {}", e.getMessage);
}
There is no exception. It's just that I am getting the response code as "403".
答案1
得分: 1
我稍微修改了你的代码,以便查看从服务器返回的实际响应。
try {
String url = "https://www.texanscu.org/";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
conn.setConnectTimeout(100000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
System.out.println(responseCode);
BufferedReader br = new BufferedReader(new InputStreamReader((conn.getErrorStream())));
String strCurrentLine;
while ((strCurrentLine = br.readLine()) != null) {
System.out.println(strCurrentLine);
}
} catch(Exception e) {
e.printStackTrace();
}
当我执行这段代码时,我看到以下输出:
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!-- ... (more HTML content) ... -->
</head>
<body>
<!-- ... (more HTML content) ... -->
</body>
</html>
Cloudflare 正在保护你尝试访问的网站,因此他们在服务器上使用了反 DoS 系统。“DoS” 表示拒绝服务攻击。例如,有人配置了成千上万台计算机来访问一个网站,试图使其过载。这些自动化攻击是由机器人或简称为“机器人”执行的。
显然,系统认为你是一个机器人。因此,你无法通过这段 Java 代码访问此端点。
编辑:我找到了这个库:
https://github.com/iambluedev1/cfscrape-java
它用于绕过 Cloudflare 的反机器人页面。你可以尝试使用它。
英文:
I modifed your code a little bit, to see the actual response from server.
try {
String url = "https://www.texanscu.org/";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
conn.setConnectTimeout(100000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
System.out.println(responseCode);
BufferedReader br = new BufferedReader(new InputStreamReader((conn.getErrorStream())));
String strCurrentLine;
while ((strCurrentLine = br.readLine()) != null) {
System.out.println(strCurrentLine);
}
} catch(Exception e) {
e.printStackTrace();
}
So when I execute this code, I see following output :
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="challenge_headline">One more step</h1>
<h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> www.texanscu.org</h2>
</div><!-- /.header -->
<div class="cf-section cf-highlight cf-captcha-container">
<div class="cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<div class="cf-highlight-inverse cf-form-stacked">
<form class="challenge-form" id="challenge-form" action="/?__cf_chl_captcha_tk__=8f811f0d4e8be53ef88568630d8c627b6a8639a6-1598364414-0-AXcy5nmycYBIOZVpr4NiQNNpsvz-TeYA4kD6NYOYQq8A9OjdxedaBdcfaEp4DM-P6EnhMFryAMIv8_Vi3PM3ukkKw8W4aFv0W4FXhYk4eJgcmPWlA6XdiAQBfIRWqmV7ORtKClPdGo9CgujUYWdpkGr_3hGiUU_bLFp9jf8mF-nCM3s9nex_0MiA916wQSCs-nhaM8_jFGdJ2VmJjczihFz8MFed_zVHNzLG4HHQdcrOl13P4jZy9Y_nhJfAyhVG0ngQXE8y-Slb_c5gvcfGGDa8vrxTpLgrQEF2-SwqkjhJTApfSUn6Y3mtjZ9ZYrA28NDZC1ngeit3IOga5pxB2wcZuYHfatTHy832J_itWa8MrtbDQV_DSWwGroAdC9q10MVYI0CIPzcxzvOrWSUYjlPYRxEKE_cw1mvO5hxsQuPtSlHIMs0bIHUpZl88F16Ki1xr8FEgqGM8aU2-VFlzYjKHh89qHe1MoapqHmZ31Na5Q0LAbGJdl69lGFGhUczHqWL9D015U4Jfpmim3203E23qb5vLnzBu8kJf6ygKDvKn" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="cf_captcha_kind" value="h">
<input type="hidden" name="vc" value="">
<script type="text/javascript" src="/cdn-cgi/scripts/hcaptcha.challenge.js" data-type="normal" data-ray="5c85e0576bec0faa" async data-sitekey="33f96e6a-38cd-421b-bb68-7806e1764460"></script>
<noscript id="cf-captcha-bookmark" class="cf-captcha-info">
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div id="no-cookie-warning" data-translate="turn_on_cookies" style="display:none">
<h1 data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies.</h1>
</div>
<script type="text/javascript">
//<![CDATA[
var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
b(function(){
var cookiesEnabled=(navigator.cookieEnabled)? true : false;
if(!cookiesEnabled){
var q = document.getElementById('no-cookie-warning');q.style.display = 'block';
}
});
//]]>
</script>
<div id="trk_captcha_js" style="background-image:url('/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=5c85e0576bec0faa')"></div>
</form>
</div>
</div>
<div class="cf-column">
<div class="cf-screenshot-container">
<span class="cf-no-screenshot"></span>
</div>
</div>
</div><!-- /.columns -->
</div>
</div><!-- /.captcha-container -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
</div>
<div class="cf-column">
<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>
<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>
<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>
<p data-translate="resolve_captcha_privacy_pass"> Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the <a href="https://chrome.google.com/webstore/detail/privacy-pass/ajhmfdgkijocedmfjonnpjfojldioehi">Chrome Web Store</a>.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">Cloudflare Ray ID: <strong>5c85e0576bec0faa</strong></span>
<span class="cf-footer-separator">&bull;</span>
<span class="cf-footer-item"><span>Your IP</span>: 178.221.185.37</span>
<span class="cf-footer-separator">&bull;</span>
<span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body>
</html>
Cloudflare is protecting the website you are trying to reach,so they are using anti-DoS system on server."DoS" means a Denial of Service attack. For example, someone configures thousands of machines to hit a website in an attempt to overload it. These automated attacks are done by robots or simply "bots".
Obviously, the system thinks you are a bot. So, there is no way you can reach this endpoint trought java code.
EDIT : If found this library :
https://github.com/iambluedev1/cfscrape-java
It's used to bypass Cloudflare's anti-bot page. You can try it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论