How to create and run a filter combination script à la that of EasyList?

Carried over from https://github.com/DandelionSprout/adfilt/issues/7#issuecomment-488537971, due to a lack of response from its posters:

I’m to some degree looking for a way to create a script akin to ones that combine different entry categories together into one, like what Frellwit and EasyList are using. That way I could’ve accomplished some things that I can’t do today (unless I had put twice as much workhours into my lists), such as creating list versions that are more suitable for non-uBO adblockers, and including the contents of someone else’s list into any of my lists.

My knowledge of coding languages in general remains questionable, but here’s a mockup of something that’d take the uBO-tailored main version of Dandelion Sprout’s Nordic Filters, and in order create an AdGuard version, an ABP/AdBlock version, and an EasyList-criteria-following ABP-only version:

# Use the uBO Nordic list as the source
fetch https://raw.githubusercontent.com/DandelionSprout/adfilt/master/NorwegianList.txt
fetch https://raw.githubusercontent.com/DandelionSprout/adfilt/master/uBO%20list%20extensions/NordicExtensionsForUBO%26Nano.txt

# Remove $document-related text from entries, as only Nano and uBO supports it
replace /\$document/.* with /\$important

# Output a version for AdGuard whose sole change is that $document entries have been replaced with $important
output https://github.com/DandelionSprout/adfilt/tree/master/NorwegianExperimentalList%20alternate%20versions/NordicFilters-AdGuard.txt

# Then take the AdGuard version, and remove entries that are not supported by ABP or AdBlock
remove text ",important"
remove text "$important"
remove text ",redirect=noopjs"
exclude entries with ":style|##+js|.*#|:xpath|:matches-css|:matches-css-before"

output > https://github.com/DandelionSprout/adfilt/tree/master/NorwegianExperimentalList%20alternate%20versions/NordicFilters-ABPAdBlock.txt

# Creating a version solely for ABP, that intends to meet ABP's list inclusion requirements.
abp_base='paragraph does not contain "Leftover empty spaces"|paragraph does not contain "Empty divider spaces"|paragraph does not contain "Distracting background"|paragraph does not contain "De-blurrers"|paragraph does not contain "anti-anti-adblocking"'

output abp_base > https://github.com/DandelionSprout/adfilt/tree/master/NorwegianExperimentalList%20alternate%20versions/NordicFilters-ABPBaseVersion.txt

How would I have had to rewrite such a script to make it runnable, which file format would be the best to use, and how could I make it autorun from GitHub’s GUI (or alternately from one-ish desktop script)?

Should this turn out to be a success, I have similar ideas in mind for at least 10 of my 54 other lists.

1 Like

Why don’t you use !#include pre-processor directive instead?

It is supported by AdGuard and uBO/Nano, and ABP will simply discard that line.

If you still want to have a script like that, the question is can you code in any programming language?
Javascript, python, bash, anything?

You are aware that !#include doesn’t work on custom (non-included) lists in AdGuard, e.g. https://raw.githubusercontent.com/DandelionSprout/adfilt/master/BrowseWebsitesWithoutLoggingIn.txt, at least not that I am aware of?

But yes, I do use it very extensively for Nano and uBO… except that it doesn’t allow adding someone else’s lists. This means that I am currently stumped if I e.g. want to combine my hosts file with Frellwit’s hosts file to create an all-Nordic hosts file with the aim of getting it included in Blokada or uMatrix.

I am not able to code in any programming language at all. I have some cursory idea how to edit JSON files and filetrees, but nothing else.

I’d have to think on it some more. I’ve thought about a feature for FilterLists that could do something like this at some point… But, briefly thought about is as far as I’ve gotten. So many ideas, so little time.

It works in the browser extensions, and we’ll add this functionality to other versions as well.

2 Likes

I’ll try to find some time and show you an example of a script that does it.
Meanwhile, install python:)

If you use Windows, there is Powershell in there (Start -> Accessories -> Powershell). It is a powerful language on .Net basis but more simple. And, yes, it can natively and easely import JSON. For example:


https://4sysops.com/archives/convert-json-with-the-powershell-cmdlets-convertfrom-json-and-convertto-json/

Here is a python script that does what you want:

It should be pretty obvious from the code how it works, and you’ll be able to modify it by yourself.

3 Likes

Thanks! I’ll try it out later tonight or tomorrow.

I can confirm that the script works, and I thank you for the help so far.

Now it’s time to up the ante a slight bit, as I didn’t really intend to remove $document/$important entries as a whole from the ABP version, but rather to just remove those parts of the lines in somewhat the same fashion as how the AdGuard version replaces $document with $empty,important.

That way, as an example, something like ||ssl.p.jwpcdn.com^*/jwpsrv.js$important,script,domain=vg.no would’ve been turned into ||ssl.p.jwpcdn.com^*/jwpsrv.js$script,domain=vg.no.

However, my attempts at https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Sandbox/NordicFiltersScriptTesting.py at stacking multiple for line in lines: paragraphs for the past hour in various ways has lead to either duplicated entries, that only the first for line in lines: paragraph is applied, or command line errors (Usually indentment errors), so I need some more help with this, it seems. :sweat_smile:

Check the updated version.

Also, look at what I’ve done to get the idea:

1 Like

Thanks yet again. I believe I’ve even found a fix for the $document-removal regex on my own, but I’ll have to post and talk about it later tonight, because I need to get back to my relatives’ Constitution Day celebrations just about now.

After I got home from the party, I realised that this was the moment where all the pieces began to link together in my brain, leading to me tinkering and improving the script like I was Tinker Bell: https://raw.githubusercontent.com/DandelionSprout/adfilt/master/NorwegianExperimentalList%20alternate%20versions/nordic_prepare_filters.py

And now the goal has been achieved: I have now become able to craft AdGuard- and ABP-specific versions of my Nordic list. :green_heart:

Now I’ve got another half-hour of work ahead of me to ensure that the new versions are correctly included in FilterLists, AdBlock, and in ABP’s secondary list archive…

Yay, congrats with your first program:)

Okay, so I’ve been holding off of asking this for as long as possible, as I expected to figure out how to figure it out eventually (which I haven’t), and using https://github.com/DandelionSprout/adfilt/blob/master/NorwegianExperimentalList%20alternate%20versions/XYZPrepareFilters.py from lines 1 through 821 as the practical example:

What kind of Python call/addition/syntax/paragraph should I add in order to

  1. remove empty lines, and
  2. remove duplicate text lines?

Duplicates:

    def prepare_ag(lines) -> str:
        text = ''
        
        previous_line = '' # or maybe: = None

        for line in lines:
            
            if line == previous_line:
                continue
                
            previous_line = line

Empty lines:

        )
        
        if line: # or maybe: if not line == '':
            text += line + '\r\n'

    return text

The one for duplicates appears at first sight to work splendidly, but in the one for empty lines the # is interpreted as the start of a comment (and changing it to \# breaks the script).

If Sublime Text’s syntax checker is any indication, then the # in the one for duplicates also seem to be interpreted as the start of a comment.

Because these are comments :slight_smile: My adventure with Python ended around version 3.1, now it’s on 3.7 - I might have forgotten a few things.

In “duplicates” your first line may be empty, so setting previous_line to empty string on the beginning may not be good idea - first empty line will be removed - thus None should work better.

In “empty lines” I’m not sure if “if” will detect empty strings as False, so if line: can be replaced by explicit comparision: if not line == '':

Ah, so those really were comments. That explains it. My fault. :sweat_smile:

I’ll be experimenting and fiddling around with them sometime later today, then.

I figured you’d want an update on that I now got the empty line remover to work. As I chiefly needed it for e.g. the TPL versions of my lists (with the option to use it for other list versions later on), it took me some on-and-off tries before I figured out if is_supported_tpl(line) and not line == '':.

I’ll try out the duplicate stuff the next time I want to do an update of https://raw.githubusercontent.com/DandelionSprout/adfilt/master/AdGuard%20Home%20Compilation%20List/AdGuardHomeCompilationList.txt.