Hi Emacs community,
I’m an elisp noob, and I recently wrote a function to get the references on a wikipedia page. I plan on using it for org-mode/org-roam so I can do research faster (even though there’s probably already a package for that sort of thing). Unfortunately, it’s probably not as robust as I would like to think it is, as some of the dois/isbns appear to be missing in some wikipedia pages I’ve tested. Here it is for reference:
(defun get-wikipedia-references (subject)
"Gets references for a wikipedia article"
(let ((wikipedia-prefix-url "https://en.wikipedia.org/wiki/"))
(with-current-buffer
(url-retrieve-synchronously (concat wikipedia-prefix-url subject))
(let* ((html-start (progn (goto-char (point-min))
(re-search-forward "^$")))
(dom (libxml-parse-html-region (1+ (point)) (point-max)))
(result))
(dolist (cite-tag (dom-by-tag dom 'cite) result)
(let ((cite-class (dom-attr cite-tag 'class)))
(cond ((string-search "journal" cite-class)
(let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
(setq result (cons (cons (concat "doi:" (dom-text a-tag))
(let* ((cite-texts (dom-texts cite-tag))
(title-beg (1+ (string-search "\"" cite-texts)))
(title-end (string-search "\"" cite-texts (1+ title-beg))))
(substring cite-texts title-beg title-end)
))
result))))
((string-search "book" cite-class)
(let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
(setq result (cons (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
(dom-text (dom-child-by-tag cite-tag 'i)))
result))))
(t
(let ((a-tag (assoc 'a cite-tag)))
(setq result (cons (cons (dom-attr a-tag 'href) (dom-text a-tag)) result))))
))
)))))
(get-wikipedia-references "Graph_traversal")
(("doi:10.1109/SFCS.1979.34" . "Random walks, universal traversal sequences, and the complexity of maze problems")
("doi:10.1016/j.tcs.2015.11.017" . "Lower and upper competitive bounds for online directed graph exploration")
("doi:10.1016/j.tcs.2020.06.007" . "Online graph exploration on a restricted graph class: Optimal solutions for tadpole graphs")
("doi:10.1587/transinf.E92.D.1620" . "The Online Graph Exploration Problem on Restricted Graphs")
("doi:10.1016/j.tcs.2021.04.003" . "An improved lower bound for competitive graph exploration")
("doi:10.1137/0206041" . "An Analysis of Several Heuristics for the Traveling Salesman Problem"))
And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things. I would appreciate any criticism from the Emacs community about my elisp!
You don’t have anything to guard against a bad response from the server. e.g.
(unless (equal url-http-response-status 200) (error "Server responded with status: %S" url-http-response-status))
To position point at the end of the headers:
(goto-char url-http-end-of-headers)
This:
(setq result (cons (cons ...) result))
Is more clearly expressed as:
(push (cons ...) result)
Better yet, you could map over the elements you’re interested in and accumulate the results via
mapcar
orcl-loop
. That would obviate the need for the “results” variable.You could probably shorten things by using the
dom-elements
function to directly search for the href’s you’re interested in in combination with dom-parent to get at the parent elements.Overall your function gets a 65 out of 130 ERU (elisp rating units).
My first suggestion would be to use
plz
for HTTP. Then I’d usecl-loop
andpcase
to simplify the rest of the code. Here’s a partial rewrite with a TODO for further exercise. :)(defun wikipedia-article-references (subject) (let* ((url (format "https://en.wikipedia.org/wiki/%s" (url-hexify-string subject))) (dom (plz 'get url :as #'libxml-parse-html-region))) (cl-loop for cite-tag in (dom-by-tag dom 'cite) for cite-class = (dom-attr cite-tag 'class) collect (pcase cite-class ((rx "journal") (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href)))))) (cons (concat "doi:" (dom-text a-tag)) ;; TODO: Use `string-match' with `rx' and `match-string' here. (let* ((cite-texts (dom-texts cite-tag)) (title-beg (1+ (string-search "\"" cite-texts))) (title-end (string-search "\"" cite-texts (1+ title-beg)))) (substring cite-texts title-beg title-end))))) ((rx "book") (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href)))))) (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi))) (dom-text (dom-child-by-tag cite-tag 'i))))) (_ (let ((a-tag (assoc 'a cite-tag))) (cons (dom-attr a-tag 'href) (dom-text a-tag))))))))
Regarding this:
And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things
First of all,
cl
andseq
are built-in to Emacs and are used in core Emacs code. There’s no reason not to use them. Second,dash
ands
are on ELPA and are widely used; it’s largely a matter of style, but they are solid libraries, so again, no reason not to use them. They don’t have cooties. ;)I read a reddit post saying that using cl-lib was kind of a bad thing, and I think I’ve always had a fear that using libraries in my config would just make it more bloated/slow Emacs down. But after all the comments here, I think I’ll change my stance on that.
having
)) )))))
is not very lispy