Skip to main content

Mike Kreuzer

Urlanguage

7 October 2018

Ruby's slow. I hear that all the time. All. The. Time. So I decided to rewrite the Ruby script that generates Ripley (the Reddit programming language index) to try & make it faster.

The script that generates the Ripley stats has two main parts:

I already thought Ruby was pretty fast for the second step, two or three milliseconds is all that takes on my laptop which is plenty fast enough. I had my doubts about the first step though.

While I was benchmarking it I got rid of an external dependency, not because there was anything wrong with it, but because it dealt with the Reddit API and that's the core business of this script… and while I was doing that… I also rewrote that code in Elixir assuming it would be much faster. Elixir's known for not being slow in as much as it's known at all. So Elixir should've been be faster… Turns out no.

The Elixir code is about that same speed as Ruby. Once you're hitting the network concurrently it seems the limiting factor is the network speed and not much else. Ruby and Elixir both took between two and four seconds to scrape the 66 language subreddits I'm now looking at, with Ruby (randomly) getting the fastest time. The concurrent-ruby gem is the key to that.

Elixir offers a lot of things for bigger, longer running apps, supervision for example. But supervision is overkill for such a small task, and while

Task.async(fn ->

is not much trickier than

Concurrent.dataflow do

underlying any Elixir code you write is Erlang. Erlang, in all its subterranean eldritch horror. I made get & post requests with Erlang's :httpc.request – which it turns out has four parameters…

:get, {url, query}, [], []

and

:post, {url, headers, content_type, body}, [], []

both seemed to work, but I had to work that out, you wouldn't know it from the "documentation". I'd show the rest of the code but working out those parameters really was most of the effort.

Erlang's "documentation" like all of Erlang is comically user hostile.

Erlang's :httpc offers a timeout which is probably handy, but not much more, and the main pain point once I worked out how the thing worked at all was probably having to turn strings into character lists – [73, 32, 115, 104, 105, 116, 32, 121, 111, 117, 32, 110, 111, 116] – so all up moving to Elixir seemed to cost nothing much, but also to offer nothing much…

Unless there's something lurking in that "documentation." Is there? What's that, what even is… *screams*

Update November 2023: I've taken my code off Github, this code's no longer available there.