A friend of mine approached me with a request to - let us say automate - a web request to a not-to-be-named raffle. I looked into it and was instantly hooked. Both price and challenge are interesting.
I wrote quite a few scraper in my time, so after a few lines good ol’ Ruby, with the help of mechanize, the first shot was ready and failed miserably due to CSRF.
After a quick check, yes they really use a pseudo-random token, which is injected into the DOM and a hidden input field via JS.
I had two options now:
- Understand the code that writes the CSRF into the dom
- Find a scraper with a JS engine
In my day job, we always like to play with with e2e-testing, which mostly involves scripts, that remote-control a web browser.
The API of watir is really amazing and easy to use:
require "watir" browser = Watir::Browser::new browser.goto "https://blog.unexist.dev" browser.close
I think the example is pretty self-explanatory, it opens up a remote session and points the browser to the given url. When you start that in e.g. irb, you can REPL your way to the desired outcome.
The above example looks for a link with GitHub in its visible text and click it, when present. Easy as that.
One problem solved, this runs nicely on my local machine. Now it would be best, if I can just deploy it on a server without installing the whole docker stack.
Since we are targeting Linux, headless support is kind of built-in. And after a quick search I found headless.
This gem wraps the handling of a virtual framebuffer for you and, as it turns out, works pretty well with my stack:
require "watir" require "headless" Headless.ly do browser = Watir::Browser::new browser.goto "https://blog.unexist.dev" browser.close end