Mastodon Profile Scraper 1
I’ve returned to my mastodon profile scraper, arachnea.py, and am writing a new version for the new mastodon web interface. Eugen has rewritten the interface to be dynamic and JavaScript-dependent. In particular, the following & followers pages now use infinite scroll (and therefore lazy loading).
I’m using selenium to puppet a headless firefox instance to load the page and let all the JS execute so the page is actually readable. (There’s nothing but a <noscript> if loaded without JS execution.) It’s taken some experimentation to devise an algorithm that can systematically scroll through the following/followers page forcing every <article> tag to load.
30 October 2022