英文:
wget distinguish header from body in output
问题
在以下命令的输出中,
wget www.google.com --save-headers --output-document - --quiet
你如何确定哪些行是头部,以及正文从哪里开始(例如,将不同部分传送到不同的管道中)?
更新
# r=$(wget www.google.com --save-headers --output-document - --quiet)
# status=$(echo $r | grep HTTP | awk '{ print $2 }')
# body=$(echo $r | awk '{ if( body ){ print $0 }; if( $0 ~ /^$/ ){ body=1 } }')
然而,$body 是空的。
更新 2
body=$(echo "$r" | awk '{ if( $1 ~ /^[\s\r\n]*$/ ) { b=1 }; if( b ) { print $0 } }')
关于 $r 周围的引号。真是个麻烦。
英文:
In the output of
wget www.google.com --save-headers --output-document - --quiet
how can you tell which lines are the headers and where the body starts (e.g., to tee the different parts into different pipelines)
Update
# r=$(wget www.google.com --save-headers --output-document - --quiet)
# status=$(echo $r | grep HTTP | awk '{ print $2 }')
# body=$(echo $r | awk '{ if( body ){ print $0 };if( $0 ~ /^$/ ){ body=1 } }')
However, $body is empty.
Uodate 2
body=$(echo "$r" | awk '{ if( $1 ~ /^[\s\r\n]*$/ ) { b=1 }; if( b ) { print $0 } }')
Quotes around $r. What a bugger.
答案1
得分: 1
RFC1945规定:
实体主体与标头之间由一个空行分隔(即,CRLF之前没有任何内容的行)。
因此,在HTTP响应中,标头位于第一个空行之前,主体位于该行之后。GNU wget的--save-headers选项也遵循相同的方式:
将HTTP服务器发送的标头保存到文件中,位于实际内容之前,以一个空行作为分隔符。
由于使用CRLF行尾,标头位于第一个CRLFCRLF(\r\n\r\n)之前,主体位于之后。对于这一部分,我会使用python,首先将响应下载为名为response的文件:
wget www.example.com --save-headers --output-document response --quiet
然后创建splitter.py,内容如下:
with open("response", "rb") as f:
headers, body = f.read().split(b"\r\n\r\n", 1)
with open("headers", "wb") as f:
f.write(headers)
f.write(b"\r\n")
with open("body", "wb") as f:
f.write(body)
并运行它:
python splitter.py
我使用二进制模式(b)以使其适用于任何编码,并在标头之后写入\r\n,因为它是最后一个键值对的CRLF。请随意使用您习惯的其他工具来进行拆分。
英文:
> how can you tell which lines are the headers and where the body starts
RFC1945 stipulates that
> The entity body is separated from the headers by a null line (i.e., a
> line with nothing preceding the CRLF).
so headers are before first blank line and body after said line in HTTP response. --save-headers option of GNU wget does follow suit
> Save the headers sent by the HTTP server to the file, preceding the
> actual contents, with an empty line as the separator.
As CRLF line endings are used headers are before first CRLFCRLF (\r\n\r\n) and body is after it. I would use python for that part following way, first download response as file named response
wget www.example.com --save-headers --output-document response --quiet
then create splitter.py as follows
with open("response", "rb") as f:
headers, body = f.read().split(b"\r\n\r\n", 1)
with open("headers", "wb") as f:
f.write(headers)
f.write(b"\r\n")
with open("body", "wb") as f:
f.write(body)
and run it
python splitter.py
I use binary (b) mode so it would work with any encoding and write \r\n after headers as it is CRLF of last key-value pair. Feel free to use any other tool you are comfortable working for making split.
答案2
得分: 0
r=$(wget www.example.com --save-headers --quiet --load-cookies /root/cookies.txt --save-cookies /root/cookies.txt --keep-session-cookies --output-document - 2>/dev/null )
status=$(echo "$r" | grep HTTP | awk '{ print $2 }')
if [ "$status" = "200" ]; then
body=$(echo "$r" | awk '{ if( body ){ print $0 };if( $0 ~ /^[\s\r\n]*$/ ){ body=1 } }')
else
exit 1
fi
英文:
r=$(wget www.example.com --save-headers --quiet --load-cookies /root/cookies.txt --save-cookies /root/cookies.txt --keep-session-cookies --output-document - 2>/dev/null )
status=$(echo "$r" | grep HTTP | awk '{ print $2 }')
if [ "$status" = "200" ]; then
body=$(echo "$r" | awk '{ if( body ){ print $0 };if( $0 ~ /^[\s\r\n]*$/ ){ body=1 } }')
else
exit 1
fi
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论