2023年6月27日 19:10:14go评论65阅读模式

英文:

Download zip files via wget

问题

以下是翻译好的部分：

如何每周将计数器增加1？
在解压时，本地保存的zip数据也会被再次解压，而不仅仅是最新下载的文件。使用cat命令将旧文件和新文件合并。因此，master.pgn文件中会包含相同的棋局。

英文:

I like to play chess and would like to download the games of the Grandmasters starting from Mon 25th Jun 2012 until today and continuously every week, on Monday from the internet as zip file. The zip files are freely available. The zip files have names ordered by a number e.g. twic920g.zip - twic1493g.zip. The next week the number increases by 1 to twic1494g.zip. For the first run this script works.

Here are my questions:

how do I increase the counter by plus 1 every week?
when unpacking, the locally saved zip data is also unpacked again and not only the alktuell downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.

#!/bin/bash

dir=&quot;pgn/zip&quot;

if [[ ! -d $dir ]]; then
    mkdir -p $dir
fi

cd $dir

# Download all PGN files
for i in {920..1493}; do
    wget -nc  https://www.theweekinchess.com/zips/twic&quot;$i&quot;g.zip
    unzip twic&quot;$i&quot;g.zip
    cat twic&quot;$i&quot;.pgn &gt;&gt; ../master.pgn
    rm twic&quot;$i&quot;.pgn
done

答案1

得分: 1

如何每周增加计数器1？
我认为，一旦您下载了历史比赛，就无需担心增加计数器：您可以通过解析从<https://theweekinchess.com/zips/>获取“当前”比赛的链接。
更稳健的解决方案可能需要不同于Shell脚本的内容，但这个脚本可以工作：

curl https://theweekinchess.com/zips/ | grep 'twic[0-9]*g.zip' | cut -f2 -d'"'

例如，运行这个脚本现在会产生：

http://www.theweekinchess.com/zips/twic973g.zip

只需运行一个脚本，每周下载最新的存档（例如，使用 cron）。

或者，您可以将上次成功下载的文件编号写入文件，并在下次运行时将其用作起始值：

#!/bin/bash

dir="pgn/zip"

if [[ ! -d $dir ]]; then
mkdir -p $dir
fi

cd $dir

# figure out number of last successfully fetched game
last_fetched=$(cat last_fetched 2> /dev/null || echo 0)

if (( last_fetched == 0 )); then
first=920
else
first=$(( last_fetched + 1 ))
fi

echo "starting with: $first"

# Download all PGN files
for (( i=first; 1; i++ )); do
# don't download a file if it already exists
[[ -f "twic${i}g.zip" ]] && continue

echo "fetching game $i"
curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
echo "$i" > last_fetched
unzip -p twic"$i"g.zip >> ../master.pgn
done

在解压缩时，也会再次解压缩本地保存的zip数据，而不仅仅是...下载的文件。使用cat命令合并了旧文件和新文件。因此，master.pgn文件中的游戏会重复。
我不太明白您的意思。您只会解压缩您刚刚下载的文件，所以任何现有的zip文件不应该有关系。
您可以在每个循环迭代中将内容追加到master.pgn，或者您可以在脚本末尾完全重新生成master.pgn：

for (( i=first; 1; i++ )); do
# don't download a file if it already exists
[[ -f "twic${i}g.zip" ]] && continue

echo "fetching game $i"
curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
echo "$i" > last_fetched
unzip twic"$i"g.zip
done

cat *.pgn >> ../master.pgn

英文:

> how do I increase the counter by plus 1 every week?

I think once you've downloaded the historic games you don't need to worry about incrementing a counter: you can get the link for the "current" game by parsing content from <https://theweekinchess.com/zips/>.

A more robust solution would probably require something other than a shell script, but this works:

curl https://theweekinchess.com/zips/ | grep &#39;twic[0-9]*g.zip&#39; | cut -f2 -d&#39;&quot;&#39;

For example, running that right now produces:

http://www.theweekinchess.com/zips/twic973g.zip

Just run a script to download the latest archive once a week (e.g., using cron).

Alternately, you could write the number of the last file downloaded successfully to a file, and use that as the starting value next time it runs:

#!/bin/bash

dir=&quot;pgn/zip&quot;

if [[ ! -d $dir ]]; then
mkdir -p $dir
fi

cd $dir

# figure out number of last successfully fetched game
last_fetched=$(cat last_fetched 2&gt; /dev/null || echo 0)

if (( last_fetched == 0 )); then
	first=920
else
	first=$(( last_fetched + 1 ))
fi

echo &quot;starting with: $first&quot;

# Download all PGN files
for (( i=first; 1; i++ )); do
	# don&#39;t download a file if it already exists
	[[ -f &quot;twic${i}g.zip&quot; ]] &amp;&amp; continue

	echo &quot;fetching game $i&quot;
	curl -sSfLO  &quot;https://www.theweekinchess.com/zips/twic${i}g.zip&quot; || break
	echo &quot;$i&quot; &gt; last_fetched
	unzip -p twic&quot;$i&quot;g.zip &gt;&gt; ../master.pgn
done

> when unpacking, the locally saved zip data is also unpacked again and not only the ... downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.

I'm not sure what you're saying here. You're only unpacking the file you've just downloaded, so any existing zip files shouldn't matter.

Instead of appending to master.pgn in every loop iteration, you could leave the unpacked files on disk and completely regenerate master.pgn at the end of the script:

for (( i=first; 1; i++ )); do
	# don&#39;t download a file if it already exists
	[[ -f &quot;twic${i}g.zip&quot; ]] &amp;&amp; continue

	echo &quot;fetching game $i&quot;
	curl -sSfLO  &quot;https://www.theweekinchess.com/zips/twic${i}g.zip&quot; || break
	echo &quot;$i&quot; &gt; last_fetched
	unzip twic&quot;$i&quot;g.zip
done

cat *.pgn &gt; ../master.pgn

答案2

得分: 0

我建议使用只有 wget 的方法来下载最新的 g.zip 文件。

wget -nc -r -nd -A g.zip https://theweekinchess.com/zips/

解释：我使用 GNU wget 的递归下载功能，这意味着 wget 将遍历给定 URL 中找到的链接（请注意，它会导航到页面，而不是特定的 zip 文件）。找到的资源将被下载到当前目录（-nd）如果它们不存在（-nc），并且只会保留文件名以 g.zip 结尾的文件（-A g.zip）。

英文:

I propose following approach using just wget to download most recent ...g.zip file

wget -nc -r -nd -A g.zip https://theweekinchess.com/zips/

Explanation: I use Recursive Download feature of GNU wget, which will mean wget will traverse links which it find in given URL (note it leads to page, not particular zip file). Find resources will be downloaded into current directory (-nd) if they do not exist already (-nc) and only files with name ending with g.zip (-A g.zip) will be kept.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

通过wget下载zip文件

问题

答案1

答案2

在内核使用OOM（Out of Memory）终止应用程序之前执行脚本或进程。

将os.Signal转换为字符串以便能够在文件中打印它。

How do I make a bash script to start running when Firefox starts and stop running when Firefox is closed?

制作一个ping库。我应该遵循实际的ping行为吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论