HtmlUnitDriver中的黑名单和白名单URLs[英] Blacklist and whitelist URLs in HtmlUnitDriver

问题描述

phantomjs和ghostdriver中的黑名单URL非常简单.首先用处理程序初始化驱动程序:

PhantomJSDriver driver = new PhantomJSDriver();
driver.executePhantomJS(loadFile("/phantomjs/handlers.js"))

配置处理程序:

this.onResourceRequested = function (requestData, networkRequest) {
    var allowedUrls = [
        /https?:\/\/localhost.*/,
        /https?:\/\/.*\.example.com\/?.*/
    ];
    var disallowedUrls = [
        /https?:\/\/nonono.com.*/
    ];

    function isUrlAllowed(url) {
        function matches(url) {
            return function(re) {
                return re.test(url);
            };
        }
        return allowedUrls.some(matches(url)) && !disallowedUrls.some(matches(url));
    }

    if (!isUrlAllowed(requestData.url)) {
        console.log("Aborting disallowed request (# " + requestData.id + ") to url: '" + requestData.url + "'");
        networkRequest.abort();
    }
};

我没有找到与HTMLunitdriver一起做到这一点的好方法.在如何如何从htmlunit的特定urls中过滤Javascript, ,但它使用WebClient,而不是HTMLUNITDRIVER.有什么想法吗?

推荐答案

扩展 htmlunitdrover 并实现ScriptPreProcessor(用于编辑内容)和a HttpWebConnection(用于允许/阻止URL):

public class FilteringHtmlUnitDriver extends HtmlUnitDriver {

    private static final String[] ALLOWED_URLS = {
            "https?://localhost.*",
            "https?://.*\\.yes.yes/?.*",
    };
    private static final String[] DISALLOWED_URLS = {
            "https?://spam.nono.*"
    };

    public FilteringHtmlUnitDriver(DesiredCapabilities capabilities) {
        super(capabilities);
    }

    @Override
    protected WebClient modifyWebClient(WebClient client) {
        WebConnection connection = filteringWebConnection(client);
        ScriptPreProcessor preProcessor = filteringPreProcessor();

        client.setWebConnection(connection);
        client.setScriptPreProcessor(preProcessor);

        return client;
    }

    private ScriptPreProcessor filteringPreProcessor() {
        return (htmlPage, sourceCode, sourceName, lineNumber, htmlElement) -> editContent(sourceCode);
    }

    private String editContent(String sourceCode) {
        return sourceCode.replaceAll("foo", "bar");        }

    private WebConnection filteringWebConnection(WebClient client) {
        return new HttpWebConnection(client) {
            @Override
            public WebResponse getResponse(WebRequest request) throws IOException {
                String url = request.getUrl().toString();
                WebResponse emptyResponse = new WebResponse(
                        new WebResponseData("".getBytes(), SC_OK, "", new ArrayList<>()), request, 0);

                for (String disallowed : DISALLOWED_URLS) {
                    if (url.matches(disallowed)) {
                        return emptyResponse;
                    }
                }
                for (String allowed : ALLOWED_URLS) {
                    if (url.matches(allowed)) {
                        return super.getResponse(request);
                    }
                }
                return emptyResponse;
            }
        };
    }
}

这既可以编辑内容,也可以阻止URL.

本文地址:https://www.itbaoku.cn/post/1740173.html