Use Google cache

Often Google cache has a full copy of an article.

Check for the existence of the cache for a given URL as some URLs may not be cached.

I had originally used curl for this.

Circumventing paywalls with eww browser and curl // Bodacious Blog

This is not an accurate science. Sometimes the cache is needed and sometimes it is not, but the more information we bring in (such as if the cache exists), the better, as this leads to a more informed decision.

Create the url-found-p function

Thanks to github-alphapapa for this one.

Weekly tips/trick/etc/ thread : emacs

(defun url-found-p (url)
  "Return non-nil if URL is found, i.e. HTTP 200."
  (with-current-buffer (url-retrieve-synchronously url nil t 5)
    (prog1 (eq url-http-response-status 200)
      (kill-buffer))))

Try them out

(url-found-p "http://webcache.googleusercontent.com/search?q=cache:https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345")
(url-found-p "http://webcache.googleusercontent.com/search?q=cache:https://news.ycombinator.com/")

Add some advice to the eww command which expands URLs just before they are loaded

;; This makes it so for certain urls, the google cache is loaded instead
(defun eww--dwim-expand-url-around-advice (proc &rest args)
  (let* ((url (car args))
         (cached_url (replace-regexp-in-string "^" "http://webcache.googleusercontent.com/search?q=cache:" url)))
    (if (and (or (string-match-p "towardsdatascience" url)
                 (string-match-p "medium.com" url))
             (not (string-match-p "webcache.google" url))
             (url-found-p cached_url))
        (setq url cached_url))
    (let ((res (apply proc (list url))))
      res)))
(advice-add 'eww--dwim-expand-url :around #'eww--dwim-expand-url-around-advice)