Blog (elixir) - Fred Wu's Tech

How I Built a Mostly Feature-Complete MVP in 3 Months Whilst Working Full-Time

2023-08-09T09:28:12.135201Z

A few weeks ago I soft launched an MVP - you are looking at it right now.

In this post I’ll talk about the features, the tech stack and the globally distributed infrastructure behind building this MVP, and of course, with a sprinkle of learnings too.

The “MVP”

Deciding what makes up an “MVP” is always interesting - I’ve heard some saying if the product isn’t embarrassing you’re releasing it too late, and also if the product is embarrassing, you’re not gonna make it.

For me, I’ve always had the idea of building all the essential features as part of the MVP, with one or two “hero” features that would differentiate the product from the competitions, and then build out more premium features over time.

If you can’t already tell, Persumi is a content creation platform with some social networking features. It may sound bland, but what I believe makes it stand out, is the desire of putting the focus back onto the content, rather than the VC-fuelled, ever increasing appetite for more ads and user hostile features.

The Long Nights and Weekends

There is no magic beans for productivity - especially when I have a full time job. Working on a side hustle means giving up on almost all social and entertainment activities. It’s not for everyone, but I didn’t mind it too much. Being an introvert definitely helped - I was happy to see night by night the MVP gradually taking shape to become more and more real.

Looking back, I spent about three months to build out most of the MVP, then another week or two on infrastructure, and another week or two for polishing, all whilst having a full time job.

It’s been a journey, I’m glad that it “only” took me 3-4 months to get to this stage, as initially I estimated for a 6+ months MVP build.

The Features

With all that in mind, I’ve set out to build the essential features that make a blogging and social networking platform:

Short form content like a tweet
Long form content like a blog post or a book chapter
RSS feeds
Communities similar to forums and sub-reddits
Direct messaging between users
A voting (like/dislike) system
A bunch of CRUD glue pieces to make all these things work

The “Hero” Features

Beyond these seemly unremarkable features, I’ve also had in mind two key features that would differentiate the platform from the rest:

The “persona” concept, whereby each user is allowed to have multiple personas to hold different content or topics of interest, e.g. a persona for professional stuff, a persona for gaming stuff and a persona for travel stuff, etc
AI generated audio content for text (also known as Text-to-Speech)

These two “hero” features are what drove me to build Persumi in the first place. Together, they solve some very real pain points for me, namely:

Following specific topics of interest from people is difficult, with the algorithms taking over people’s home feeds, there are simply way too much noise, thanks to VC-fuelled “user engagement” metrics
Content consumption on the go (e.g. during commute or during workouts, etc) is becoming more and more prevalent, but the traditional platforms haven’t adopted to this new lifestyle other than shoving short form content down our throats

There’s also a third “hero” feature: the Aura system. Unlike the upvote/downvote or like/dislike buttons in many social platforms that only serve the algorithm to push more content to you, Persumi’s Aura system keeps track of user’s content quality over time, and would punish the low quality content and promote high quality content using visual cues - lower quality content has a much lower contrast making them easy to ignore. In the age of social media, self-curating content becomes essential to keep a platform healthy, engaging and usable.

The Non-MVP Features, a.k.a. The Future

There are many features that didn’t make the MVP cut, most of these are value-added features that will eventually make their way into paid subscriptions - if Persumi gains enough traction to attract users who don’t mind paying for premium features.

A prime example of such paid features is ones that help users monetise their content, e.g. ad revenue sharing and paid subscribers (like Patreon).

I also have the ambition of building out Persumi’s features so it can eventually compete against the likes of LinkedIn and Tinder.

Wouldn’t it be better for the world to have a platform like Persumi that doesn’t focus on dark patterns and exploiting users? 😉

The Tech Stack

Over the past decade or so I’ve mainly worked with two tech stacks: Ruby and Elixir. So naturally, Persumi was going to be built using one of them.

After some consideration, I’ve decided to go ahead with Elixir, the main reasons were:

Elixir and Erlang/OTP support distributed systems out of box
I’ve been writing more Elixir than Ruby lately, so I’m more productive in Elixir
I really wanted to try and use LiveView in production
I prefer Phoenix’s application architecture more than Rails’

On top of Elixir, I’ve decided early on a few other things to go with it:

Tailwind for CSS
Postgres for database, preferably a serverless option
A search engine
An easy to maintain infrastructure that doesn’t cost an arm and a leg

Elixir

I first discovered Elixir in 2014 while I was still actively involved in the Ruby and Rails communities, but it was two years later that I had the opportunity to really dive into it. I built a few open source libraries to help me learn Elixir and OTP:

Crawler - a high performance web scraper.
OPQ: One Pooled Queue - a simple, in-memory FIFO queue with back-pressure support, built for Crawler.
Simple Bayes - a Naive Bayes machine learning implementation. Hey, I was doing machine learning before it was mainstream! 😆
Stemmer - an English (Porter2) stemming implementation, built for Simple Bayes.

With a decent amount of experience building Phoenix web apps over the years, I unfortunately never had the opportunity to use LiveView…

I guess that’s finally changed now.

LiveView has been amazing, not only does it drastically reduce the amount of front-end code you have to write, make the entire web app feel super snappy, but it also virtually eliminates any front-end and back-end code/logic duplications. It is such an awesome piece of technology that improves both the user experience and the developer experience.

Petal Pro

During my initial tech research, I came across Petal Pro which is a boilerplate starter template built on top of Phoenix. It handles things like user authentication which almost every web app needs, but is somewhat tedious to build.

Petal Pro isn’t free but it ended up saving me so much time. I also started contributing small bug fixes and features to it too. If you are about to build something in Phoenix, check it out!

Tailwind CSS

As I kept progressing my career, there have been fewer and fewer opportunities for me to write front-end and CSS code. Last time I rebuilt my blog, I used Bulma - that was 2019. Since then Tailwind has gained a lot more traction, so I wanted an excuse to try finally give it a shot.

There is a debate on how many new things you should try for building your MVP - the more you have to learn, the slower your MVP progresses. That said, given CSS is reasonably straightforward, I figured it wouldn’t slow me down too much, if anything, Tailwind’s flexibility might just eventually make up any time lost in learning.

I’m happy to report that it is indeed true - by using Tailwind, it became significantly easier for me to customise my components and elements. I can see why it became so popular. It’s not for everyone, but I like it.

Postgres

Choosing Postgres as a database was a no brainer, given how popular and versatile it is. I did briefly consider NoSQL options like DynamoDB but quickly wrote them off as I needed an RDBMS to get things off the ground quickly, and the DB is unlikely to be the bottleneck for a long time anyway.

In Elixir, the Ecto library works wonders for Postgres.

Later in the post I’ll touch on how I deploy and run Postgres in production.

Search Engine

For a search engine, my requirements were:

The ability to search across multiple fields of a schema
The ability to rank them
The ability to have typo tolerance, word stemming and other similar language features to make search more intuitive
The ability to search multiple languages, including CJK (Chinese/Japanese/Korean) characters
Simple to run
Cheap to run

Using Postgres’ full text search was probably going to be the simplest but it doesn’t offer all the functionalities I need without a bunch of set up and manual SQL queries so I didn’t pursue it.

Elasticsearch on the other hand, offers good search functionalities but takes a bit of effort to set up and maintain, and can be costly to run.

After doing some more research, I found the following three options that would fit my needs:

Both Meilisearch and Typesense are open source, with commercial SaaS offerings, whilst Algolia is SaaS-only.

It’s been an interesting journey. I started with Typesense as I liked what I read, but I quickly discovered that it doesn’t search Chinese characters properly.

I then turned to Meilisearch. I especially liked the fact that they offered a generous free tier SaaS to get you off the ground running. Spoiler: during my implementation they did a bait and switch and removed the free tier.

At the time the Elixir support for Meilisearch wasn’t up to date, so I ended up contributing to a community library to add the features I needed.

Curious timing, after Meilisearch removed their free tier, I discovered that even though they officially support searching for Chinese characters, the implementation wasn’t perfect. I found some edge cases where characters weren’t detected properly, making the search results unreliable.

So, my last hope was Algolia. Despite them being the more expensive option out of the three, it does offer a free tier. It turns out, their search results for Chinese characters were much better than Meilisearch’s. Luckily, re-implementing the search from Meilisearch to Algolia didn’t take too much effort, it was pretty much done in one night.

Infrastructure

Early on during the development I’d already determined I wanted to try Fly and Neon, for web and DB, respectively.

I am in no way associated with either company, I was curious about Fly due to its tie-in with the Elixir community (Phoenix Framework’s author Chris McCord works there), and Neon due to its serverless nature.

Globally Distributed Infra

With Fly, the infrastructure automatically becomes globally distributed as soon as I started provisioning servers in more than one region. As of the time of writing, Persumi is deployed to US West, Australia and EU.

Despite being simple to use, making Fly work initially actually took quite a bit of finessing due to its incomplete official documentation and flakiness. Some of the services were having issues during the course of my MVP development. Worse, they don’t report (or sometimes even acknowledge) the issues unless they are region-wide outages. To this date, I believe their blue/green deployment strategy which was recently introduced, is still buggy, I often have to use their rolling deployment strategy instead. Deployment logs were provided to Fly but I think they’re too busy with other things…

Still, I’m sticking with them for now due to the ease of use after the initial hurdle, and their globally distributed infrastructure without asking for my kidney.

To augment Fly’s web servers, I also use Cloudflare’s CDN as well as R2 to serve asset files and audio files.

Funny tangent, initially I used Bunny for asset files and CDN, as I misread Cloudflare’s terms and thought I couldn’t serve audio files from Cloudflare. Bunny worked okay but their dashboard for some reason was painfully slow - not a good look for a CDN company. Like the search engine switch, it didn’t take me too long to switch over to Cloudflare.

Serverless Postgres

There are a few options to run Postgres:

Run on a standard server for maximum portability, but it requires more server maintenance overhead
Run on AWS RDS/Aurora or a similar managed service, easy but can be costly
Run on a serverless option such as Aurora Serverless or Neon

For my use case, I think option 2 or 3 are better fitting. As I mentioned earlier, I started the experiment with Neon.

Neon worked well initially, until I started deploying Fly instances in multiple regions. Due to Neon being only available in one region (I chose US West), and I live in Australia, the round trips between Fly’s Australian instance and Neon’s US instance were a show stopper - especially when complex DB transactions were involved. Actions sometimes took seconds to complete, yikes.

Despite Fly not offering a managed Postgres service, I ended up trying it anyway due to its distributed nature. After incorporating Fly Postgres in the app, all DB operations immediately became more responsive. Paired with LiveView, it feels like running the application locally.

The current Persumi infra looks like:

1 x Fly instance in US West, always on
1 x Fly instance in Australia, auto-shutdown when there’s no traffic
1 x Fly instance in Netherlands, auto-shutdown when there’s no traffic
1 x Fly Postgres writer instance in US West, always on
1 x Fly Postgres read replica instance in Australia, always on
1 x Fly Postgres read replica instance in Netherlands, always on

With this setup, I think I’m quite happy with the cost and scalability balance - it costs ~$20/m to run, with the potential of both vertical and horizontal scaling with ease.

The Missteps

The search engine and CDN swaps mentioned earlier certainly took away some of my time, but they were nothing compared to a major misstep I encountered.

And that was: the choice of how machine learning is done.

Let me explain.

Machine Learning, and Inference

Even before I started the first line of code, I already painted a picture in my head on the machine learning needed: a TTS (text-to-speech) model that I could run inference locally on the instance.

The reason being I believed it was the more flexible approach to gradually improve the inference and therefore the end result by training my own AI models over time.

Given I didn’t want to rent expensive GPU instances, I opted for fast TTS models that could do near real-time inference on CPUs. I used Coqui TTS.

The resulting out-of-box audio wasn’t great, but I kept pressing on.

The show stopper came when it was time to deploy everything onto Fly. Due to Fly’s architecture (they deploy small-ish Docker images, < 2GB each, onto their global network), I struggled to keep the Docker image file small enough to be able to deploy. With Coqui TTS, I would need Python and all the dependencies that resulted in a Docker image around 4-5GB in size.

With my tunnel vision, I then chose to offload the entire Python and Coqui TTS dependency tree onto Fly’s persistent volumes. I knew it wasn’t a great option, as that meant my infrastructure (other than the database) was no longer immutable.

Sometimes it’s necessary to take a step back, re-evaluate, and then press on in a different direction. Which thankfully I did.

The new direction is quite simple really: instead of performing inference locally, use an external service instead.

After doing a quick comparison between the offerings from AWS, Azure and GCP, I ended up using Google’s TTS. Honestly I think I would’ve been happy with any of the options, they all seem to have decent neural based TTS.

In hindsight, these giant corporations have much more resources and expertise to train better models than I ever could on my own.

The end result:

The TTS sounds significantly better than before
It’s just as cheap to run (Google offers a certain amount of free TTS API calls per month)
It no longer needs complex Python calls and FFmpeg calls to make local TTS work
The Fly infrastructure is simple and immutable again

In hindsight, I never should’ve even entertained the idea of running ML locally on CPUs, no matter how simple and efficient a model might be.

That said, with TTS, it wasn’t as simple as just calling the APIs and getting the perfect resulting audio back. Some pre and post processing were needed, but that’s a topic for another time.

More Machine Learning

The cherry on top - now that Google’s APIs were integrated into the app, I ended up also using Google’s PaLM 2 to do text summarisation (it was initially done locally too) as well as for a ChatGPT-like AI prompt service, to power Persumi’s AI writing assistance feature.

The Closing

If you read this far, thank you! I hope you enjoyed reading (or listening) to this post. Please look around and kick tyres, I would love your feedback on how to improve Persumi.

]]>

Coding and Learning Should Never Stop, Open Sourcing is Caring

2017-08-27T11:21:01.000000Z

I’ve had a productive coding weekend, and so I decided to share my experience. Now, many developers choose to treat their career as a series of 9-5 jobs, but if you’re reading this, I assume you’re like the rest of us who love continuous learning and self improvement.

Preface

About a year ago I started learning Elixir. So as part of the learning experience, I wrote two matching learning related libraries: Stemmer and Simple Bayes. It was a great, really enjoyable experience and I learnt a lot about the concepts of word stemming, naive Bayes classification and of course, functional programming.

These topics have been interesting to learn about, but until one gets to use them on a daily basis for a while, key concepts are less likely to convert from short term memory to long term memory. Given my day job is not about writing Elixir (yet), I needed to find other ways to keep my skill-level up and to continue exploring new things.

So about a month ago, I picked up a project I started a year ago but gave up shortly after: Crawler. At the time GenStage was just announced and I was interested in incorporating it into my project as I thought it’d be a great fit. But due to varies reasons - mostly not having a firm grasp of the GenStage concept and implementation, as well as taking on a CTO role at a startup, I couldn’t find enough time and patience to make it work so I had to let it go.

Until now.

Crawler, on Steroids

I’d realised that the way I was trying to incorporate GenStage into my project was never going to work. Not because of GenStage itself, but because of the way I approached learning it. At the time I was so eager to make use of GenStage, and coming off the back of my good streak of releasing the aforementioned machine learning libraries, I thought I could take shortcuts and things would all work out perfectly.

No.

So I licked the wounds, learnt my mistakes and changed tactics. This time, I scoped out and encapsulated my learnings (just as one would in designing a software system), and eventually I came up with another library - OPQ: One Pooled Queue.

OPQ: One Pooled Queue

I knew Crawler could technically work without any queueing system, or GenStage. So I kept on building Crawler, until I felt productive in writing Elixir again, and touched enough areas and concepts that I knew exactly what I needed from GenStage.

And luckily for me too, GenStage over the past year has matured and more importantly, has better documentation with more code examples. Upon closer investigation of the code examples, I found that their GenEvent and RateLimiter examples were almost exactly what I needed. It was an epiphany moment for me after reading and understanding these examples, all of a sudden I “get it”.

If you take a look at the source code of OPQ you’ll notice that the heavy lifting logic was mostly inspired (or even copy-pasted) from those examples.

Open Sourcing is Contributing and Caring

Up until this point, as far as writing open source Elixir code goes, it had mostly been me writing my own code for my own projects. But open sourcing is much more than just writing one’s own code and publishing them on Github.

If you’ve followed my work you’ll know that I’m a big fan of contributing to other projects, some of which are well-known ones like Rails and Slim.

It just so happens, that over the weekend I ran into situations where I needed to contribute to other Elixir projects - and spoiler alert: one of which is Elixir itself.

ElixirRetry

One of the features I set out to add to Crawler, was to allow failed crawls to retry before giving up. Naturally, the first thing I did was to try find an existing package to support the retry functionality.

After some googling and digging into some source code, I’ve found and settled on ElixirRetry - a neat library that was cleverly built.

Soon I found out that since its last release, it offered both retry/2 and retry/3, the latter of which was to support an extra argument that specifies which exceptions to allow as part of the retry flow. It was a great addition, but it doesn’t effect me as Crawler doesn’t need it.

As a developer who cares deeply about code quality and clarity, I immediately thought about how the interfaces for retry/2 and retry/3 could be improved, by simply combining them and making the extra argument in retry/3 an option.

So I did, and issued a pull request here: https://github.com/safwank/ElixirRetry/pull/12 You’ll notice that I’ve also made some improvements to the test suite while I was at it. ;)

Elixir Typespec

One issue I ran into while building Crawler, was the static analysis via Dialyzer - some code that actually ran and passed tests functionally, failed the type checking.

As someone who isn’t as experienced in Elixir, I first opened an issue and phrased it more as a question seeking validity of the issue. I then jumped on Elixir’s Slack group and asked people in the group about this issue.

Fortunately, Ben Wilson immediately came to aid by verifying my issue and validating my suspicion of it being an issue in Elixir’s typespec documentation.

And so, a pull request was created and approved shortly after.

Sharing is Caring

The bulk of Crawler as well as the entirety of OPQ were built in the past month or so. I hope some people will benefit from having these libraries around. And, I hope people also enjoyed me sharing my experience, and perhaps be inspired to start sharing more too.

I will leave you all with a Chinese saying: 滴水石穿. The literal translation is “dripping water penetrates the stone”, and what it means is “constant perseverance yields success“.

]]>

Elixir and Doctest - Help Writing Better Programs, One Function At A Time

2017-08-07T10:00:20.000000Z

Preface

If memory serves right, it’s been several years since I first dabbled in Elixir, but it was about a year ago I really started putting some serious effort into learning Elixir, and as a result I made two libraries in the machine learning space: Simple Bayes, a Naive Bayes text classifier implementation, and Stemmer, an English (Porter2) stemming implementation.

Unfortunately after I’ve released those two libraries, I hadn’t had much opportunities to work with Elixir. My day jobs have been mostly Ruby, JavaScript, PHP and a dash of Golang. And so, after being silent for a year, I’ve decided to pick up something I had started a year ago - a web Crawler. If you are new to Elixir, feel free to follow this project as I am actively developing it.

Learnings

The preface is to give a bit of background of when and how I started learning Elixir, now, let me talk about one of my favourite features of Elixir, and how it helps me write better code not just in Elixir, but in virtually any other language.

Introducing the topic of today, a really simple feature, and in fact it has been part of Python for years - the doctest.

Doctest

In short, a doctest is pieces of code examples that run as part of the test suite, and show up as part of the documentation. For example:

defmodule Greeting do
  @doc """
  ## Examples

      iex> hello("world")
      "hello world"

      iex> hello("dear")
      "hello dear"
  """
  def hello(input) do
    "hello #{input}"
  end
end

And then all you need is in your corresponding test file, to enable doctest:

defmodule GreetingTest do
  use ExUnit.Case

  doctest Greeting
end

Even though I knew about Python’s doctest for years, I’d never realised how impactful it can be given its simplistic nature, probably due to the fact that I’ve never used Python for anything serious.

Until Elixir.

There are three things I find the most impactful as I write more doctests: clarity, scope, and design.

Clarity

As a ruby programmer, I appreciate greatly the beauty of not just the main code base, but also its test suite. Hence I’ve always preferred to use RSpec in larger code bases. However, as the application gets more complex, and as the number of test files grows, the cognitive overhead of reading and processing all the files and lines becomes higher and higher.

Doctest solves this perfectly - no longer do we have to crawl through the right file and the right line for a particular test case, all the test cases are neatly presented right in front of you as you read the function itself.

You might think this as trivial, but just like many organisations spend time and effort to optimise for effective communication by studying proxemics, proxemics between different components of a software code base also plays a role in improving the code clarity, and ultimately the code quality.

Scope

Doctest is purposely simple, and is designed for unit tests. There have been many times when I found myself realising my function was too dependant on external states, or are doing too many things because it was hard to write simple doctests. In a way, the constraints of doctests have forced me to rethink the scope of my function, and that would often lead to an overall better designed system.

Design

As much as I’d like to think about the SOLID principles all the time, it is often too easy to dig deep wholes in the midst of building things.

Every now and then I find myself extracting a piece of logic to a private function and calling it a day. In Elixir, only public functions can have doctests - again, this constraint pushes you to think about the importance and the role of a particular function, perhaps it is better to be moved to another module as a public function therefore can have its own doctests. Here is an example when I did some refactorings on Crawler.

Closing

I wanted to write this article for a while now - as I truly love and appreciate Elixir’s asthetics and features. Many developers might find functional programming as a barrier, but I can assure you that with Elixir’s tooling and ecosystem, and of course doctest (wink), building software feels like a breeze.

Last time when I was this happy building software was when I first discovered ruby. If you haven’t given Elixir a try yet, I encourage you to do so sooner rather than later, it will not only give you a functional programming perspective, but will also help you write better code in other languages.

]]>

I Accidentally Some Machine Learning - My Story of A Month of Learning Elixir

2016-07-23T18:33:29.000000Z

About a month ago I was in-between jobs - I had two weeks to rest up, recharge and get ready for my new job. So I thought, I should use those two weeks to learn something new.

Years ago I briefly looked into Elixir when it was first released to the wild, at the time I wasn’t interested in picking it up due to its syntax similarity to Ruby, despite their vastly different underlying semantics. I love Ruby, and it’s been my weapon of choice for the past 6-7 years, so when it came time for me to learn something new, I naturally wanted to learn something a bit more different than Ruby, syntax-wise.

Fast-forward a few years, I am more mature and open-minded, and are now in a position to welcome Elixir and to embrace the Ruby-like syntax as well as the functional programming mindset with open arms.

Learning Elixir

Given the strong influence of Ruby in Elixir, I can’t help but get nostalgic about learning it by reading another great book by Dave Thomas: Programming Elixir.

Dave Thomas’ original Programming Ruby was instrumental in the success of the Ruby programming language and its communities in the west, and it certainly has helped me to greatly widen my exposure to not only the wonderful world of Ruby but object-oriented programming in general as a PHP developer.

Learning Elixir has been really fun, not only does it share the same developer-friendliness championed by Ruby, it also does so with remarkable strength in concurrency thanks to BEAM (Erlang’s VM).

Many people are drawn to Elixir due to its Ruby influence, its functional and immutability nature, and its Actor-based concurrency model. I would however like to call out one of my favourite features of Elixir that sometimes gets overlooked, and that is how you could write unit tests using ExUnit.DocTest.

Elixir Makes Writing Unit Tests Effortless

Python programmers have been writing doctests for years, but Ruby due to not having an equivalent standard library, has never had writing doctests taken off in its community. I’m glad Elixir has.

Take a look at the code snippet below:

defmodule Stemmer.Step0 do
  @doc """
  ## Examples

      iex> Stemmer.Step0.trim_apostrophes("'ok")
      "ok"

      iex> Stemmer.Step0.trim_apostrophes("o'k")
      "o'k"

      iex> Stemmer.Step0.trim_apostrophes("'o'k'")
      "o'k"
  """
  def trim_apostrophes(word) do
    word
    |> String.replace_prefix("'", "")
    |> String.replace_suffix("'", "")
  end
end

The three test cases given will actually be tested by ExUnit, provided you ask it to do so as below:

defmodule Stemmer.Step0Test do
  use ExUnit.Case, async: true

  doctest Stemmer.Step0
end

You may think having doctests is not a big deal, what I can say is that I am really enjoying the fact that unit tests can be written with minimal friction and high visibility (as they sit with the implementation, rather than within a big test suite with a sea of files). Besides, not all of us practice TDD religiously or 100% of the time, and when I don’t or can’t do TDD, having doctests ensures I don’t forget adding test cases. :)

Toy Robot in Elixir

The best way to learn something new is to practice it as you learn it. Instead of diving straight into a complicated system, or to write a hello world program, I thought I would dig out a code test and re-implement it in Elixir.

The code test is the rather infamous Toy Robot test. As an interviewer, I must have reviewed a few hundreds of these tests, most of which in Ruby. I know what a good OO solution looks like, so naturally I was looking forward to “rethink” the problem and have a crack at it using Elixir.

Here is the result: https://github.com/fredwu/toy-robot-elixir

As a learner, I wanted to practice as many language features as possible, so included in the Toy Robot test code I tried:

Different data structures such as Struct and List
Pattern matching
Data piping (|>)
Agent for managing states
Macros for validation rules

I kept the test code small and nimble on purpose to illustrate the readability of a highly expressive language. I may not have exercised all the language features (such as supervisors and protocols), but at that point I felt I could start the Elixir journey with a big smile on my face already.

Learning Phoenix

I have to admit, a big part of the reason that got me started on learning Elixir is to build a side project. Initially I wanted to simply use Ruby and Rails because I can be very productive using them, but in the end I decided on learning Elixir and Phoenix, because most side projects fail, and when they do, how much learning can you extract out of those experiences? My answer to this question, is to learn a new language, a new framework and most importantly a new programming paradigm to drastically increase the amount of learning I could gain.

But please, before you jump on Phoenix, learn Elixir properly first! Over the years I’ve seen far too many cases of people jumping on Rails without understanding the language features of Ruby…

To learn Phoenix, the Programming Phoenix book is pretty much the bible on this subject aside from the official guide, and it is written by Chris McCord, the author of Phoenix, Bruce Tate, and Jose Valim, the author of Elixir.

As an experienced Rails developer, it only took me a day or two to go through the book (I largely skimmed the chapters on Channels as I don’t intend to build a real-time app).

Now, as a Ruby developer, the biggest pain point when working on Rails or other sizeable Ruby MVC projects for me, is ActiveRecord. I got so frustrated to the status quo I even attempted to fix it.

Fortunately, in Elixir and Phoenix we have Ecto. One of my favourite features of Ecto is the concept of Changeset.

Take a look at the code snippet below:

defmodule MySecretApp.User do
  use MySecretApp.Web, :model

  schema "users" do
    field :username, :string
    field :email, :string
    field :password, :string, virtual: true
    field :encrypted_password, :string

    has_one :profile, MySecretApp.Profile

    timestamps()
  end

  def changeset(struct, params \\ %{}) do
    struct
    |> cast(params, [:username, :email])
    |> validate_required([:username, :email])
    |> validate_length(:username, min: 3, max: 20)
    |> validate_format(:username, ~r/\A[a-zA-Z0-9_]+\z/, message: "alphanumeric and underscores only")
    |> validate_format(:email, ~r/@/)
    |> unique_constraint(:username)
    |> unique_constraint(:email)
  end

  def creation_changeset(struct, params \\ %{}) do
    struct
    |> changeset(params)
    |> password_changeset(params)
  end


  defp password_changeset(struct, params) do
    struct
    |> cast(params, [:password])
    |> validate_required([:password])
    |> validate_length(:password, min: 8, max: 200)
    |> encrypt_password
  end

  defp encrypt_password(changeset) do
    case changeset do
      %Ecto.Changeset{valid?: true, changes: %{password: password}} ->
        put_change(changeset, :encrypted_password, Comeonin.Bcrypt.hashpwsalt(password))
      _ ->
        changeset
    end
  end
end

The changset/2 function can be used when a user needs to be updated, whereas the creation_changeset/2 is to be used only when a user is first created. Sure, you can achieve similar result in Rails by using custom validators, but the fact that this practice is enforced by the library and the framework, is encouraging.

One other thing that I’ve seen in larger Rails apps, is the leaky abstraction of view-level logic, they typically sit in controllers, helpers (which are globally available) or worse, models. Phoenix in this case follows what Hanami, Trailblazer and many other frameworks do: it introduces a “view model” layer.

Something along the lines of:

defmodule MySecretApp.UserView do
  use MySecretApp.Web, :view

  def full_name do
    "#{title} #{first_name} #{last_name}"
  end
end

In a nutshell, Phoenix is just like Rails, but less magical (in a good way) and faster. :) Like the Elixir eco-system, most libraries and concepts feel like their ruby/rails counterparts, but more refined.

There are a few popular Hex packages if you’re familiar with their ruby counterparts:

	Elixir	Ruby
Code analysis	credo	rubocop
Testing	espec	rspec
Browser testing	wallaby	capybara
Test coverage	excoveralls	simplecov
Test factory	ex_machina	factory_girl
Authentication	guardian	devise

And the list goes on and on… Be sure to check out Awesome Elixir for more community curated Elixir libraries.

Learning Machine Learning

As I started building the foundation of my side project, you know, the usual user management and session management, etc, etc, by chance I came across mentions of Bayesian inference.

One thing led to another, very soon I started looking into varies different machine learning algorithms such as Naive Bayes and Random Forest. These are all subjects I had never come across or even heard of before, as I have no strong mathematics, statistics or computer science background. I was however, intrigued and inspired nonetheless, and wanted to employ some machine learning in my side project.

And the best way to understand and learn about machine learning algorithms? You guessed, is to write one! After some research, I came to the conclusion that Naive Bayes is one of the simplest to implement, is great for text classification which is useful for my side project, and has good accuracy given its simplicity and fast speed.

Introducing Simple Bayes

So, after a few train rides to and from work, on my newly purchased Macbook (yes, the one with only a single USB-C port), I built a Naive Bayes library: Simple Bayes.

As of writing, this is the only Naive Bayes library in Elixir that supports the following features:

Naive Bayes algorithm with different models
- Multinomial
- Binarized (boolean) multinomial
- Bernoulli
No external dependencies
Ignores stop words
Additive smoothing
TF-IDF
Optional keywords weighting
Optional word stemming via Stemmer

Let’s see it in action shall we? :)

SimpleBayes.init
|> SimpleBayes.train(:ruby, "I enjoyed using Rails and ActiveRecord for the most part.")
|> SimpleBayes.train(:ruby, "The Ruby community is awesome.")
|> SimpleBayes.train(:ruby, "There is a new framework called Hanami that's promising.")
|> SimpleBayes.train(:ruby, "Please learn Ruby before you learn Rails.")
|> SimpleBayes.train(:ruby, "We use Rails at work.")
|> SimpleBayes.train(:elixir, "It has Phoenix which is a Rails-like framework.")
|> SimpleBayes.train(:elixir, "Its author is a Rails core member, Jose Valim.")
|> SimpleBayes.train(:elixir, "Phoenix and Rails are on many levels, comparable.")
|> SimpleBayes.train(:elixir, "Phoenix has great performance.")
|> SimpleBayes.train(:elixir, "I love Elixir.")
|> SimpleBayes.train(:php, "I haven't written any PHP in years.")
|> SimpleBayes.train(:php, "The PHP framework Laravel is inspired by Rails.")
|> SimpleBayes.classify("I wrote some Rails code at work today.")
# => [
# ruby: 0.20761437345986136,
# elixir: 0.08101868169313056,
# php: 0.019047884912605735
# ]

As you can see, provided with reasonable training data, Naive Bayes can work extremely well.

Introducing Stemmer

Something I discovered as I was building Simple Bayes, is something called stemming. Let’s see another example:

SimpleBayes.init(stem: false)
|> SimpleBayes.train(:apple, "buying apple")
|> SimpleBayes.train(:banana, "buy banana")
|> SimpleBayes.classify("buy apple")
# => [
# banana: 0.057143654737817205,
# apple: 0.057143654737817205
# ]

Oops, the probabilities of the sentence “buy apple” of being apple and banana are the same, that’s not good, as we know “buying” and “buy” should mean the same thing. This is where stemming comes in handy. Let’s enable stemming and run the example again:

SimpleBayes.init(stem: true)
|> SimpleBayes.train(:apple, "buying apple")
|> SimpleBayes.train(:banana, "buy banana")
|> SimpleBayes.classify("buy apple")
# => [
# apple: 0.18096114003107086,
# banana: 0.1505149978319906
# ]

Stemming in this case correctly identified “buy” and “buying” as the same thing (they both have the stemmed root of “buy”).

How did I include stemming in Simple Bayes you wonder? Let me introduce you to Stemmer - as of writing this is the only Porter2 stemmer available in Elixir. :)

Elixir and BEAM may not be known for their raw performance, but compared to a similar Porter2 stemmer implemented in pure ruby, my Elixir version runs about five times faster (under 5s to stem 29000+ words, compared to 25s with the ruby version).

The Learning Continues…

I’ve only been writing Elixir for a month, and there are still heaps to learn and to practice. So far though, I have to say my experience with both Elixir and Phoenix has been fantastic - I get the same satisfying and pumped feeling I got when I was learning Ruby and Rails years ago.

If you haven’t checked out Elixir, I strongly encourage you to do so, after all, the world seems to be moving in the concurrency direction at an increasingly rapid speed, including Ruby 3x3. :)

]]>