英文:
Delete lines in blocklist file, where the end of those lines match an entry from an allowlist file (Using Dnsmasq syntax)
问题
以下是翻译的内容:
这是我以前提出问题的修改,这次要考虑Dnsmasq语法中的块列表。我尝试删除块列表文件中的行,但仅当块列表行的末尾与允许列表文件中的条目匹配时才删除。因此,子域块列表条目也应被删除。尝试坚持使用awk,因为该软件包已包含在OpenWRT中,而不是可用但需要额外下载的gawk。
块列表文件(使用Dnsmasq风格语法):
local=/randomsites.com/
local=/calendar.google.com/
local=/google.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
允许列表文件:
google.com
希望输出到new_blocklist:
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
上面是您提供的翻译内容。
英文:
This is a modification of a question I've previously asked, this time to account for Dnsmasq syntax in the blocklist. I'm trying to delete lines in an blocklist file, but only if the end of the blocklist line matches an entry in an allowlist file. Subdomain blocklist entries should therefore also be removed. Trying to stick to using awk since that package is included in OpenWRT, as apposed to eg gawk which is available, but is an additional download.
Blocklist file (with dnsmasq style syntax)
local=/randomsites.com/
local=/calendar.google.com/
local=/google.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
Allowlist file
google.com
Desire output to new_blocklist
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
Below is where I've tried to modify a solution from a previous question I asked, but this time to account for dnsmasq syntax in the blocklist. This method was extremely fast to process large lists (~300k lines in around 10 seconds on dual core router), which is especially useful for lower powered routers.
$ cat tst.awk
BEGIN { FS="." }
NR==FNR {
allow[$0"/"]
next
}
{
addr = $NF
for ( i=NF-1; i>=1; i-- ) {
addr = $i FS addr
if ( substr(addr,8) in allow ) {
next
}
}
}
{ print }
awk -f tst.awk allow block
is producing output:
local=/randomsites.com/
local=/calendar.google.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
And so is not removing the local=/calendar.google.com/ entry as desired.
The exact previous solution by Ed Morton which worked perfectly for a blocklist of eg google.com, calendar.google.com (ie without the dnsmasq syntax of local=/...../) was:
$ cat tst.awk
BEGIN { FS="." }
NR==FNR {
allow[$0]
next
}
{
addr = $NF
for ( i=NF-1; i>=1; i-- ) {
addr = $i FS addr
if ( addr in allow ) {
next
}
}
}
{ print }
I realise I haven't modified this solution correctly, but I have tried quite a lot time/reading to at least try and solve myself first.
REPORTING ON SOLUTIONS BELOW
Both @markp-fuso and @Ed Morton (author of original awk solution exc dnsmasq syntax) solutions are producing exactly the same, and correct, result. Now the interesting part is the run-times of both solutions on a netgear r7800 openwrt router, which is a dual core 2015 CPU. Multiple runs on each produced consistent runs times:
@markp-fuso solution:
300k lines blocklist, 13 lines allowlist = 23.3 seconds
300k lines blocklist, 300k lines allowlist = 21.5 seconds
@Ed Morton solution:
300k lines blocklist, 13 lines allowlist = 47.4 seconds
300k lines blocklist, 300k lines allowlist = 46
To note, both solutions have a faster runtime with the larger allowlist!
Thankyou both! This is really great, and a big contribution to a little project we have going for OpenWRT to block ads on router. Please delete link if not allowed here:
https://forum.openwrt.org/t/adblock-lean-set-up-adblock-using-dnsmasq-blocklist/157076/35
答案1
得分: 2
Here is the translated content:
修改回答 OP 之前问题的一个想法:
$ cat tst.awk
BEGIN { FS="/" } # 使用 "/" 进行分割
NR==FNR { allow[$0]; next }
{ n=split($2,arr,".") # 进一步使用 "." 进行分割
# 与之前问题的回答一样处理
addr = arr[n]
for ( i=n-1; i>=1; i-- ) {
addr = arr[i] "." addr
if ( addr in allow )
next
}
}
1 # 打印当前行;与 "{ print }" 的行为相同
**注意:** 通过使用 `FS="/"`,然后引用 `$2`,我们去掉了 `local=/` 和(尾随的)`/`
测试一下:
$ awk -f tst.awk allow block
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
英文:
One idea for modifying the answer to OP's previous question:
$ cat tst.awk
BEGIN { FS="/" } # split on "/"
NR==FNR { allow[$0]; next }
{ n=split($2,arr,".") # further split on "."
# process as with answer to previous question
addr = arr[n]
for ( i=n-1; i>=1; i-- ) {
addr = arr[i] "." addr
if ( addr in allow )
next
}
}
1 # print current line; behaves identically to "{ print }"
NOTE: by using FS="/"
and then referencing $2
we are stripping off the local=/
and (trailing) /
Taking for a test drive:
$ awk -f tst.awk allow block
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
答案2
得分: 2
$ cat tst.awk
BEGIN { FS="." }
NR==FNR {
allow[$0]
next
}
{
orig = $0
gsub("^local=|/$","")
addr = $NF
for ( i=NF-1; i>=1; i-- ) {
addr = $i FS addr
if ( addr in allow ) {
next
}
}
}
{ print orig }
<p>
$ awk -f tst.awk allowlist blocklist
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
英文:
$ cat tst.awk
BEGIN { FS="." }
NR==FNR {
allow[$0]
next
}
{
orig = $0
gsub("^local=/|/$","")
addr = $NF
for ( i=NF-1; i>=1; i-- ) {
addr = $i FS addr
if ( addr in allow ) {
next
}
}
}
{ print orig }
<p>
$ awk -f tst.awk allowlist blocklist
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
答案3
得分: 1
在您的特定情况下,您也可以使用 grep
。如果您有允许列表的内容,您可以简单地使用 grep -v
与模式,例如:
grep -v '\([.]\|/\)google.com/$' blocklist
如果 allowlist
包含一个如上所示的单个条目,在bash中,您可以执行以下操作:
grep -v '\([.]\|/\)'"$(<allowlist)"'/$' blocklist
如果您正在使用POSIX shell,则可以结合使用 read -r
并从 allowlist
中的单个条目构建模式,例如:
read -r match <allowlist
pattern="\([.]\|/\)$match/$"
grep "$pattern" blocklist
示例用法/输出
所有上述方法与您在 allowlist
和 blocklist
中显示的内容提供相同的输出,例如:
$ grep -v '\([.]\|/\)'"$(<allowlist)"'/$' blocklist
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
注意:如果 allowlist
包含多个条目,则需要在bash中使用 readarray -t
来填充一个带有内容的索引数组,并从那里构建模式。在POSIX shell中,您只需循环 while read -r match; do ... done < allowlist
。
grep
只是提供另一种方法。如果 allowlist
包含多个条目,那么在多次调用 grep
时创建的子shell将会有所不同。如果您有问题,请告诉我。
英文:
In your specific case you could also use grep
. If you have the contents of allow list, you can simply use grep -v
with the pattern, e.g.
grep -v '\([.]\|/\)google.com/$' blocklist
If allowlist
contains a single entry as shown, then in bash you could do:
grep -v '\([.]\|/\)'"$(<allowlist)"'/$' blocklist
And if you are using POSIX shell, then you could combine read -r
and build the pattern from the single entry in allowlist
, e.g.
read -r match <allowlist
pattern="\([.]\|/\)$match/$"
grep -v "$pattern" blocklist
Example Use/Output
All above provide the same output with the contents in allowlist
and blocklist
as you show, e.g.
$ grep -v '\([.]\|/\)'"$(<allowlist)"'/$' blocklist
local=/randomsites.com/
local=/google.com.fake.com/
local=/fakegoogle.com/
Note: if allowlist
contains multiple entries, then you would need to use readarray -t
in bash to fill an indexed array with the contents and build your pattern from there. In POSIX shell you would just loop while read -r match; do ... done < allowlist
.
grep
just provides another approach. With a single entry in allowlist
there would be little, if any, difference in efficiency between the use of awk
and grep
. If however allowlist
contains multiple entries, then awk
would provide a better solution avoiding the subshells created with multiple calls to grep
. Let me know if you have questions.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论