2023年2月16日 17:54:46go评论92阅读模式

英文:

wget distinguish header from body in output

问题

在以下命令的输出中，

wget www.google.com --save-headers --output-document - --quiet

你如何确定哪些行是头部，以及正文从哪里开始（例如，将不同部分传送到不同的管道中）？

更新

# r=$(wget www.google.com --save-headers --output-document - --quiet)
# status=$(echo $r | grep HTTP | awk '{ print $2 }')
# body=$(echo $r | awk '{ if( body ){ print $0 }; if( $0 ~ /^$/ ){ body=1 } }')

然而，$body 是空的。

更新 2

body=$(echo "$r" | awk '{ if( $1 ~ /^[\s\r\n]*$/ ) { b=1 }; if( b ) { print $0 } }')

关于 $r 周围的引号。真是个麻烦。

英文:

In the output of

wget www.google.com --save-headers --output-document - --quiet

how can you tell which lines are the headers and where the body starts (e.g., to tee the different parts into different pipelines)

Update

# r=$(wget www.google.com --save-headers --output-document - --quiet)
# status=$(echo $r | grep HTTP | awk &#39;{ print $2 }&#39;)
# body=$(echo $r | awk &#39;{ if( body ){ print $0 };if( $0 ~ /^$/ ){ body=1 } }&#39;)

However, $body is empty.

Uodate 2

body=$(echo &quot;$r&quot; | awk &#39;{ if( $1 ~ /^[\s\r\n]*$/ ) { b=1 }; if( b ) { print $0 } }&#39;)

Quotes around $r. What a bugger.

答案1

得分: 1

RFC1945规定：

实体主体与标头之间由一个空行分隔（即，CRLF之前没有任何内容的行）。

因此，在HTTP响应中，标头位于第一个空行之前，主体位于该行之后。GNU wget的--save-headers选项也遵循相同的方式：

将HTTP服务器发送的标头保存到文件中，位于实际内容之前，以一个空行作为分隔符。

由于使用CRLF行尾，标头位于第一个CRLFCRLF（\r\n\r\n）之前，主体位于之后。对于这一部分，我会使用python，首先将响应下载为名为response的文件：

wget www.example.com --save-headers --output-document response --quiet

然后创建splitter.py，内容如下：

with open("response", "rb") as f:
    headers, body = f.read().split(b"\r\n\r\n", 1)
with open("headers", "wb") as f:
    f.write(headers)
    f.write(b"\r\n")
with open("body", "wb") as f:
    f.write(body)

并运行它：

python splitter.py

我使用二进制模式（b）以使其适用于任何编码，并在标头之后写入\r\n，因为它是最后一个键值对的CRLF。请随意使用您习惯的其他工具来进行拆分。

英文:

> how can you tell which lines are the headers and where the body starts

RFC1945 stipulates that

> The entity body is separated from the headers by a null line (i.e., a
> line with nothing preceding the CRLF).

so headers are before first blank line and body after said line in HTTP response. --save-headers option of GNU wget does follow suit

> Save the headers sent by the HTTP server to the file, preceding the
> actual contents, with an empty line as the separator.

As CRLF line endings are used headers are before first CRLFCRLF (\r\n\r\n) and body is after it. I would use python for that part following way, first download response as file named response

wget www.example.com --save-headers --output-document response --quiet

then create splitter.py as follows

with open(&quot;response&quot;, &quot;rb&quot;) as f:
    headers, body = f.read().split(b&quot;\r\n\r\n&quot;, 1)
with open(&quot;headers&quot;, &quot;wb&quot;) as f:
    f.write(headers)
    f.write(b&quot;\r\n&quot;)
with open(&quot;body&quot;, &quot;wb&quot;) as f:
    f.write(body)

and run it

python splitter.py

I use binary (b) mode so it would work with any encoding and write \r\n after headers as it is CRLF of last key-value pair. Feel free to use any other tool you are comfortable working for making split.

答案2

得分: 0

r=$(wget www.example.com --save-headers --quiet --load-cookies /root/cookies.txt --save-cookies /root/cookies.txt --keep-session-cookies --output-document - 2>/dev/null )

status=$(echo "$r" | grep HTTP | awk '{ print $2 }')

if [ "$status" = "200" ]; then
        body=$(echo "$r" | awk '{ if( body ){ print $0 };if( $0 ~ /^[\s\r\n]*$/ ){ body=1 } }')
else
    exit 1
fi

英文:

r=$(wget www.example.com --save-headers --quiet --load-cookies /root/cookies.txt --save-cookies /root/cookies.txt --keep-session-cookies --output-document - 2&gt;/dev/null )

status=$(echo &quot;$r&quot; | grep HTTP | awk &#39;{ print $2 }&#39;)

if [ &quot;$status&quot; = &quot;200&quot; ]; then
        body=$(echo &quot;$r&quot; | awk &#39;{ if( body ){ print $0 };if( $0 ~ /^[\s\r\n]*$/ ){ body=1 } }&#39;)
else
    exit 1
fi


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

wget 从输出中区分头部和正文

问题

答案1

答案2

Use FTPS (通过发送 AUTH TLS) 与 wget

在Ubuntu Core（snappy）上安装Go编程语言。

这是什么原因导致了这种golang os.Exec行为（转义双引号）？

curl和wget在查询Docker Hub速率时行为不同。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论