Product Builders - Communities

Introducing Rizz.farm - An AI-Assisted Lead Generation Tool for Reddit

ifredwu@gmail.com (Fred Wu) — Sat, 06 Jan 2024 13:10:47 +0000

After having soft launched Persumi in 2023, I quickly found myself wanting to explore better ways to promote the product. Unfortunately, most of digital marketing products are focused on three main areas:

Keyword bidding, e.g. Google Ads or Facebook Ads
Cold outreach, e.g. Apollo
Search Engine Optimisation (SEO)

They each has its own pros and cons. As an indie developer without a huge budget, none of these tools work effectively.

The Challenges

Keyword bidding and targeted ads are expensive

Bidding wars drive up the cost of targeted ads, sometimes so significantly they price out many small businesses.

Users are increasingly using adblockers, making targeted ads ineffective.

Cold outreach are rarely answered

Highly sought-after leads are bombarded with cold emails every day, making the tactic ineffective.

Most cold emails end up in users’ spam bin, never to be seen again.

Search Engine Optimisation is a lost cause

SEO’ed content are being produced hastily, making them low quality and untrustworthy.

Sophisticated users have learnt to ignore highly SEO’ed content from untrusted sources.

The Epiphany

Reflecting on my own behaviour in choosing a product or service, I realised that over the years I have developed a good sensory to weed out the noise. Namely, I am very skeptical with highly ranked websites, I always use an adblocker (uBlock) to block annoying ads, and both my personal and work email inboxes are overflowed with cold emails I never read them.

So how do I do research, say when I’m buying a new TV, or subscribing to a new software package? I read user reviews. And not just questionable reviews on Amazon or review sites, but on Reddit, where (mostly) real users discuss and review products and services.

Of course, like any other place, Reddit can be filled with bots and fake content too. However, in my experience these are easily distinguishable, especially paired with content’s upvotes and user’s post history. With Reddit’s popularity, this makes it a highly desirable place for people to both look for answers, as well as to provide answers.

So I asked myself, what if there was a smart lead generation tool to help me promote my product on Reddit? What should the tool be capable of?

The Solution

In order to really stand out, and provide tangible value, I’ve focused on the following areas for the lead gen tool.

Warm leads over cold leads

By searching for and engaging with relevant content, the leads the tool finds for you are warm and targeted.

Many of the leads are already highly ranked on search engines, further boosting the content’s reach.

Personalised and public

Content the tool produces for the user are context-aware and personalised, making them highly relevant and trustworthy.

Even better, the content are made available publicly, boosting the user’s reputation and reach even further over time.

Infinite scale at a fixed cost

The tool finds leads and suggest responses for the user 24/7 non-stop, at a fraction of the usual marketing costs.

The pricing is fixed and transparent, so the user can scale their business without worrying about blowing the budget.

The Bottom Line

Ultimately, I want a lead generation tool that really provides value to not only its user (the marketer), but its user’s users (the target audience). Instead of focusing on selling, it should focus on helping people. This is where Rizz.farm comes in.

How Rizz.farm Works

With Rizz.farm, machine learning is put to good use. It’s not just another AI wrapper, it actually leverages many of the latest AI innovation to really take lead generation to the next level.

Smart search

Rizz.farm widens the search radius to find the most relevant or highly ranked content for the user. It may find leads a simple Google search couldn’t. Behind the scenes, keywords are expanded and multiple search queries are submitted to search engines.

Continuous monitor

Rizz.farm continuously monitors the social media platforms (Reddit to start off with) to find new leads 24/7, delivering them right to the user’s inbox. Whether they are highly ranked popular posts or the latest post in a particular subreddit, Rizz.farm is able to quickly deliver them to the inbox without fuss.

Leads inbox, as simple as Gmail

Leads are automatically sent to the user’s inbox, where they can review and decide whether to engage with them. It’s as simple as using email.

“Rizz Score”

As mentioned earlier, the tools should focus on helping people, rather than hard selling. To make it easier for the user, Rizz.farm calculates a score based on how relevant, helpful and spammy the generated or user edited response is. This ensures it always publishes high quality content for the target audience, making it a win-win.

Self-Learning AI

Last but not least, by using state of the art AI technologies that adapts to the users, Rizz.farm is able to help them automate and scale their lead generation and reputation building with ease.

Taking It For a Spin

As soon as I’ve deployed Rizz.farm to production, I started using it to promote Persumi and dare I say I am very happy and impressed with its results. Not only does it find relevant leads right away, it drafts up responses that are super helpful and relevant to the OP’s questions or posts, and is very subtle in pushing the “promotion” agenda. I know a lot of subreddits shun or outright ban self-promotion, so it’s extremely important to have helpful posts.

If you are looking for an affordable way to promote your product, service, event or anything really, please give Rizz.farm a try. It comes with 7 days of free trial and 30 days of no-question-asked money back guarantee. For a limited time, you can also get 50% off for six months using the coupon code LAUNCH2024.

Happy lead generation and growth hacking everyone!

]]>

Comparison of AI OCR Tools: Microsoft Azure AI Document Intelligence, Google Cloud Document AI, AWS Textract and Others

ifredwu@gmail.com (Fred Wu) — Sun, 17 Sep 2023 03:44:24 +0000

For a project I’m working on, we are looking at different AI OCR (Optical Character Recognition) options that would allow us to import documents with various layouts and extract relevant data from them with high enough accuracy. Due to the nature of these documents and the information contained in them, it is paramount that there would be an easy way for us to train the AI models using our documents.

Without revealing exactly what types of documents we are working with, as they’re commercially sensitive, the basic premise is to:

Import a document that might be one or more pages, the document itself may be high quality PDF, or low quality scanned documents that may or may not be skewed
Each page may or may not contain relevant information we want to extract
Relevant information may be structured in different ways, but usually in a tabular form (but each “line item” can either be horizontal or vertical)

In our case, a traditional OCR solution wouldn’t work as the documents often contain heaps of irrelevant data, and for the relevant data, it needs to understand the context in order to work out which data elements to extract, and how they relate to each other. Therefore, a machine learning based OCR solution that could adopt to our documents is highly desirable.

The success of the project is measured by the end results of imported data:

How much data is detected and recognised
How accurate the recognised data elements are
At a minimum, 80%+ accuracy is needed in order to save time in manual data entry, and to reduce/avoid incorrect data from being checked by a human operator before entering the system

Available OCR Tools

There are a number of OCR tools available, for our project the minimum criteria are:

AI powered solution so it can be further developed and tweaked by us for our documents
A UI solution for data training, preferably easy to use so that non-technical team members can help with data training too
A quick and easy way to test and verify trained models
A solution that is compliant with the ISO standards relevant for us
Pay by usage pricing model that’s not cost prohibitive

The tools I have discovered and considered were:

AWS Textract
Microsoft Azure Document Intelligence (formerly known as Azure Form Recognizer)
Google Cloud Document AI
Rossum.ai
Super.ai
Eden.ai

All six options are AI based OCR solutions. Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) are the obvious “big three”, having their own AI models, whereas Super and Eden are aggregators that farms AI calls to providers like AWS, Azure and Google, and provides a middle layer that could potentially make the end result better (or worse). It’s unclear what model Rossum uses.

After some consideration, Rossum, Super and Eden were all taken off the table due to:

Rossum and Super not having transparent pricing
Eden being an API aggregator built by a small team, it’s unclear on its usefulness and longevity

With the longevity of the company and product in mind, as well as transparent pricing and commercial support, AWS, Azure and Google remain to be the leading choices.

The Big Three: AWS, Azure and Google

AWS Textract

Out of the big three, AWS is our preferred vendor due to its market leading position as a cloud service provider, as well as our existing projects already using AWS.

However, upon closer inspection, it appears Textract does not allow custom data training. Textract is provided “as-is” using Amazon’s pre-trained models, it does not allow customers like us to provide their own training data to improve on the detection and recognition of different types of documents.

AWS’ machine learning blog has published an article on “end-to-end intelligent document processing solution“ in 2020, however it does not explain in detail exactly what the capabilities are, and the solution requires a complex architecture that needs significant amount of time and effort to set up and maintain.

Azure Document Intelligence

Azure’s offering, formerly known as Form Recognizer, is an end-to-end solution that offers custom data labelling and training.

Google Cloud Document AI

Google’s offering, similar to Azure’s, is also an end-to-end solution that offers custom data labelling and training.

Comparison: Azure vs Google

With the initial assessment done, we are left with Azure and Google, so let’s dive into the deep comparison of the two platforms.

We will compare the two platforms in several aspects:

The ease of initial set up, i.e. how quickly can we get up and running right away
Whether a “base” pre-trained model is provided for documents with high enough accuracy so we save time in customisation and training
The detection and recognition – how much data is accurately detected and recognised
The custom training process, how easy is it to label custom data
How fast or slow the training is
The end result – provided with the same training dataset, which platform offers better results

The Initial Setup

Microsoft Azure

Azure’s initial set up is relatively straightforward. There is a four-step dialog window to set up the required Azure resources, etc. After which, documents can simply be uploaded for immediate consumption - labelling and training, which will be touched on in later sections.

Google Cloud

Google works slightly differently in this case. To set up a project it’s very simple, there is literally only the “Processor name” to fill out. However, it does require a separate step to set up the storage bucket required.

Also, instead of allowing files to be uploaded within the same UI like Azure, Google requires files to be uploaded to the cloud storage bucket first, then imported into Document AI.

Auto Labelling Data

Microsoft Azure

“Auto-labelling” refers to the AI OCR’s ability to do the initial labelling automatically, to save time and effort in the manual labelling work. It can be seen as the starting point of training the AI model.

In this case, the Azure experience is significantly better. It allows files to be uploaded, then auto-labelled by “current”, “unlabelled” or “all” documents. This gives the flexibility to users (i.e. us) to choose when a document should be auto-labelled – as it does take a bit of time, sometimes if a document is not very well structured and would require more manual labelling anyway, then it’s best to skip the auto-label step.

Google Cloud

Google on the other hand, has a more cumbersome process. We first need to upload the files to the storage bucket, as mentioned before. Then, we select the path/folder of the uploaded files, and the “auto-labelling” is enabled or disabled for the entire import. Also, sometimes the import takes a long time, whereas on Azure it’s always instant.

More on the accuracy of auto-labelling later.

Text Detection and Recognition

A big part of OCR is obviously text detection and recognition. Even without training, a good OCR solution should detect and recognise clear text effortlessly.

During our initial research, we’ve found this article that compared the text recognition accuracy between AWS, Azure and Google. Their finding was that Azure and Google were comparable with a slight edge to Azure (Azure performed better on 2 out of 3 documents, and Google performed better on 1 out of 3 documents), both were way ahead of AWS in terms of accuracy.

Our own testing has somewhat mirrored their experience. We found that in general, Azure performed better in the number of detected text elements, as well as the correctness of these text.

On Azure, it’s very rare for text to be undetectable, whereas on Google, we’ve found several instances where the text is very clear, yet Google was unable to detect the text at all.

Similarly on accuracy, Azure has rarely detected clear text incorrectly, whereas on Google, a lot of times units such as x10*9/L get recognised incorrectly despite all instances appearing similar in their appearance (i.e. very clear), and sometimes dashes won’t get recognised (e.g. 10.0 - 12.50 gets recognised as 10.0 12.50).

Custom Labelling

During training, a big portion of time is spent on custom labelling, meaning we look for and assign text to our pre-defined data elements such as name and occupation - these are just fictional examples.

Overall, Azure and Google each has its strengths and weaknesses. Let’s go through them in detail.

Auto Layout

On Azure, there is a “Run layout” step that would recognise all the text elements, as well as tabular data in a given document.

Once “run layout” is performed, all the recognised text are highlighted in yellow, giving a quick and easy overview of the usable data elements.

Google on the other hand, does not offer a similar function, therefore labelling needs to be performed with more effort, more on this later.

Schema vs Schema-less

One key difference between Azure and Google’s solutions, is that the Azure solution is schema-less whereas the Google solution is based on defined schemas.

On Azure, we can add a “field”, and then assign the field a type (string, number, etc). A field can then be renamed or re-assigned with a different type at any time.

There is a special field called “table field”, we can create tabular data using this field type.

Google works very differently. Instead of adding fields on the fly, Google requires a schema (a.k.a. field definitions) to be created first. Once a field is created, and data trained, it cannot be edited or deleted.

One advantage of Google’s approach is a more definitive structure of OCR’ed data. When creating a label, not only do we have to choose a type (similar to Azure), we also can choose its occurrence logic, see the screenshot below.

Despite this, we found that the schema-less approach on Azure offers far better versatility without being confined to a pre-defined schema, making training new revisions of models far easier.

Labelling Data

Most of the time spent on training, is the manual labelling of data elements on the documents. Azure’s approach in this case is significantly faster and more accurate, compared to Google’s.

Microsoft Azure

On Azure, because of the “run layout” step, all text elements are already detected. So labelling them as part of the test results is very simple, you click on the yellow elements to select the text you want, then click on the table cell on the right to assign them.

However, if somehow Azure detects the data incorrectly, there is no way for us to provide a correction. In this case, we could choose to either skip the cell, or simply assign the cell with the incorrect data – if more training data is provided over time then these edge cases will not have a big impact on the final accuracy of the model.

In the screenshot above, sensitive data elements are blacked out for the purpose of this blog post.

Google Cloud

On Google, this process is significantly more involved. Due to not having a “run layout”-like step, we need to draw over the text to select it, and there’s no way to tell whether Google has detected the text before using the tool, so sometimes you’ll draw over some text and get an error saying, “Cannot create labels with empty values”.

Again, in the screenshot above, sensitive data elements are blacked out for the purpose of this blog post.

And because we can’t simply click on the highlighted text already detected, like on Azure, this makes the whole process significantly longer as we have to carefully draw over the text to ensure the boundary is accurate (for training purpose), sometimes we have to zoom in and reposition the document for accurate drawing.

The schema assignment is also less intuitive compared to Azure’s. Instead of having a table to easily see tabular data, Google’s schema is more like a series of key-value pairs, making glancing at assigned data much harder, and the key-value elements are ordered alphabetically, instead of logically like on Azure, making assigning data unnatural.

The one advantage of Google’s approach though, is that when the recognised data is incorrect, we can manually override it with our own correct data. However, after using both solutions for a while, we’ve found that Google’s has a much higher error rate compared to Azure’s to begin with.

Auto-Label Accuracy

In order to assess the auto-labelling accuracy, we have trained several of our documents with varying quality and layout. We then uploaded a new document with a layout that has not been trained.

The results are very telling.

Microsoft Azure

Azure has pretty much labelled things perfectly - all the data elements were detected correctly and within the correct contexts. The second comment block is also detected correctly.

In this case, the Azure auto-label has only missed the first comment block.

The screenshot above is blurred to protect the sensitive nature of it.

Google Cloud

Google’s result unfortunately falls short of expectation.

As seen in the screenshot below, Google has missed several data elements, and incorrectly detected elements that should not be part of the result. Again, the screenshot is blurred to protect the sensitive nature of it.

This suggests two possible things about Google’s OCR solution:

The default text recognition is poor, therefore it’s missing data elements even though they are clearly presented, in the same way as other elements
The neural network for language models is poor, therefore it incorrectly detected text that shouldn’t be part of any result

Auto-Label Result Verification

Microsoft Azure

Once Azure’s performed the auto-labelling, it’s very easy to check the results, as seen in the blurred screenshot below, we simply look at the table to ensure the correctness. However, as mentioned before, we can only correct detected regions of text, not the content itself.

Google Cloud

On Google, it is more involved. As seen in the blurred screenshot below, it requires a lot of scrolling to find the auto-labelled data. Alternatively, we can hover over the highlighted text to look at the auto-labelled text, although in this case, the pop-up gets in the way of other data which makes it an annoyance at times.

Curiously, despite having a schema, and setting certain elements to “required once” (per schema), Google still went ahead and falsely detected multiple results per schema.

But as mentioned before, Google does offer the ability to manually correct the text content, which is a big plus.

Data Training Speed

We have found training to be significantly faster on Azure.

With about 45 documents, it takes about 30 minutes to train on Azure.

Google takes about twice as long, about an hour to train, and another several minutes to deploy the trained model. Azure does not need a separate deployment step.

Interestingly, on Google, training a model requires having enough test data. So documents need to be split into training and test groups. It is unclear whether the test data is used for actual training.

There is also a required number of labelled elements (10 minimum, 50 recommended) before a model can be trained. Azure has no such limitations.

Data Regions and Compliance

While both Azure and Google are compliant to many standards (ISO, GDPR, etc), Google’s Document AI can only be hosted out of their US or EU regions. Whereas Azure has no such limitation and can be hosted in any of their available regions.

This has a side effect on the performance of UI operations. Azure is very quick and snappy due to it being deployed in our local AU region, whereas Google due to it being deployed in the US region, is a little slow every time you open and close a document for example.

Conclusion

With the in-depth analysis done, it is no surprise that in the end we went with Microsoft Azure’s Document Intelligence for our AI OCR needs. I hope these findings are useful to other people too.

]]>

How I Built a Mostly Feature-Complete MVP in 3 Months Whilst Working Full-Time

ifredwu@gmail.com (Fred Wu) — Wed, 09 Aug 2023 09:28:12 +0000

A few weeks ago I soft launched an MVP - you are looking at it right now.

In this post I’ll talk about the features, the tech stack and the globally distributed infrastructure behind building this MVP, and of course, with a sprinkle of learnings too.

The “MVP”

Deciding what makes up an “MVP” is always interesting - I’ve heard some saying if the product isn’t embarrassing you’re releasing it too late, and also if the product is embarrassing, you’re not gonna make it.

For me, I’ve always had the idea of building all the essential features as part of the MVP, with one or two “hero” features that would differentiate the product from the competitions, and then build out more premium features over time.

If you can’t already tell, Persumi is a content creation platform with some social networking features. It may sound bland, but what I believe makes it stand out, is the desire of putting the focus back onto the content, rather than the VC-fuelled, ever increasing appetite for more ads and user hostile features.

The Long Nights and Weekends

There is no magic beans for productivity - especially when I have a full time job. Working on a side hustle means giving up on almost all social and entertainment activities. It’s not for everyone, but I didn’t mind it too much. Being an introvert definitely helped - I was happy to see night by night the MVP gradually taking shape to become more and more real.

Looking back, I spent about three months to build out most of the MVP, then another week or two on infrastructure, and another week or two for polishing, all whilst having a full time job.

It’s been a journey, I’m glad that it “only” took me 3-4 months to get to this stage, as initially I estimated for a 6+ months MVP build.

The Features

With all that in mind, I’ve set out to build the essential features that make a blogging and social networking platform:

Short form content like a tweet
Long form content like a blog post or a book chapter
RSS feeds
Communities similar to forums and sub-reddits
Direct messaging between users
A voting (like/dislike) system
A bunch of CRUD glue pieces to make all these things work

The “Hero” Features

Beyond these seemly unremarkable features, I’ve also had in mind two key features that would differentiate the platform from the rest:

The “persona” concept, whereby each user is allowed to have multiple personas to hold different content or topics of interest, e.g. a persona for professional stuff, a persona for gaming stuff and a persona for travel stuff, etc
AI generated audio content for text (also known as Text-to-Speech)

These two “hero” features are what drove me to build Persumi in the first place. Together, they solve some very real pain points for me, namely:

Following specific topics of interest from people is difficult, with the algorithms taking over people’s home feeds, there are simply way too much noise, thanks to VC-fuelled “user engagement” metrics
Content consumption on the go (e.g. during commute or during workouts, etc) is becoming more and more prevalent, but the traditional platforms haven’t adopted to this new lifestyle other than shoving short form content down our throats

There’s also a third “hero” feature: the Aura system. Unlike the upvote/downvote or like/dislike buttons in many social platforms that only serve the algorithm to push more content to you, Persumi’s Aura system keeps track of user’s content quality over time, and would punish the low quality content and promote high quality content using visual cues - lower quality content has a much lower contrast making them easy to ignore. In the age of social media, self-curating content becomes essential to keep a platform healthy, engaging and usable.

The Non-MVP Features, a.k.a. The Future

There are many features that didn’t make the MVP cut, most of these are value-added features that will eventually make their way into paid subscriptions - if Persumi gains enough traction to attract users who don’t mind paying for premium features.

A prime example of such paid features is ones that help users monetise their content, e.g. ad revenue sharing and paid subscribers (like Patreon).

I also have the ambition of building out Persumi’s features so it can eventually compete against the likes of LinkedIn and Tinder.

Wouldn’t it be better for the world to have a platform like Persumi that doesn’t focus on dark patterns and exploiting users? 😉

The Tech Stack

Over the past decade or so I’ve mainly worked with two tech stacks: Ruby and Elixir. So naturally, Persumi was going to be built using one of them.

After some consideration, I’ve decided to go ahead with Elixir, the main reasons were:

Elixir and Erlang/OTP support distributed systems out of box
I’ve been writing more Elixir than Ruby lately, so I’m more productive in Elixir
I really wanted to try and use LiveView in production
I prefer Phoenix’s application architecture more than Rails’

On top of Elixir, I’ve decided early on a few other things to go with it:

Tailwind for CSS
Postgres for database, preferably a serverless option
A search engine
An easy to maintain infrastructure that doesn’t cost an arm and a leg

Elixir

I first discovered Elixir in 2014 while I was still actively involved in the Ruby and Rails communities, but it was two years later that I had the opportunity to really dive into it. I built a few open source libraries to help me learn Elixir and OTP:

Crawler - a high performance web scraper.
OPQ: One Pooled Queue - a simple, in-memory FIFO queue with back-pressure support, built for Crawler.
Simple Bayes - a Naive Bayes machine learning implementation. Hey, I was doing machine learning before it was mainstream! 😆
Stemmer - an English (Porter2) stemming implementation, built for Simple Bayes.

With a decent amount of experience building Phoenix web apps over the years, I unfortunately never had the opportunity to use LiveView…

I guess that’s finally changed now.

LiveView has been amazing, not only does it drastically reduce the amount of front-end code you have to write, make the entire web app feel super snappy, but it also virtually eliminates any front-end and back-end code/logic duplications. It is such an awesome piece of technology that improves both the user experience and the developer experience.

Petal Pro

During my initial tech research, I came across Petal Pro which is a boilerplate starter template built on top of Phoenix. It handles things like user authentication which almost every web app needs, but is somewhat tedious to build.

Petal Pro isn’t free but it ended up saving me so much time. I also started contributing small bug fixes and features to it too. If you are about to build something in Phoenix, check it out!

Tailwind CSS

As I kept progressing my career, there have been fewer and fewer opportunities for me to write front-end and CSS code. Last time I rebuilt my blog, I used Bulma - that was 2019. Since then Tailwind has gained a lot more traction, so I wanted an excuse to try finally give it a shot.

There is a debate on how many new things you should try for building your MVP - the more you have to learn, the slower your MVP progresses. That said, given CSS is reasonably straightforward, I figured it wouldn’t slow me down too much, if anything, Tailwind’s flexibility might just eventually make up any time lost in learning.

I’m happy to report that it is indeed true - by using Tailwind, it became significantly easier for me to customise my components and elements. I can see why it became so popular. It’s not for everyone, but I like it.

Postgres

Choosing Postgres as a database was a no brainer, given how popular and versatile it is. I did briefly consider NoSQL options like DynamoDB but quickly wrote them off as I needed an RDBMS to get things off the ground quickly, and the DB is unlikely to be the bottleneck for a long time anyway.

In Elixir, the Ecto library works wonders for Postgres.

Later in the post I’ll touch on how I deploy and run Postgres in production.

Search Engine

For a search engine, my requirements were:

The ability to search across multiple fields of a schema
The ability to rank them
The ability to have typo tolerance, word stemming and other similar language features to make search more intuitive
The ability to search multiple languages, including CJK (Chinese/Japanese/Korean) characters
Simple to run
Cheap to run

Using Postgres’ full text search was probably going to be the simplest but it doesn’t offer all the functionalities I need without a bunch of set up and manual SQL queries so I didn’t pursue it.

Elasticsearch on the other hand, offers good search functionalities but takes a bit of effort to set up and maintain, and can be costly to run.

After doing some more research, I found the following three options that would fit my needs:

Both Meilisearch and Typesense are open source, with commercial SaaS offerings, whilst Algolia is SaaS-only.

It’s been an interesting journey. I started with Typesense as I liked what I read, but I quickly discovered that it doesn’t search Chinese characters properly.

I then turned to Meilisearch. I especially liked the fact that they offered a generous free tier SaaS to get you off the ground running. Spoiler: during my implementation they did a bait and switch and removed the free tier.

At the time the Elixir support for Meilisearch wasn’t up to date, so I ended up contributing to a community library to add the features I needed.

Curious timing, after Meilisearch removed their free tier, I discovered that even though they officially support searching for Chinese characters, the implementation wasn’t perfect. I found some edge cases where characters weren’t detected properly, making the search results unreliable.

So, my last hope was Algolia. Despite them being the more expensive option out of the three, it does offer a free tier. It turns out, their search results for Chinese characters were much better than Meilisearch’s. Luckily, re-implementing the search from Meilisearch to Algolia didn’t take too much effort, it was pretty much done in one night.

Infrastructure

Early on during the development I’d already determined I wanted to try Fly and Neon, for web and DB, respectively.

I am in no way associated with either company, I was curious about Fly due to its tie-in with the Elixir community (Phoenix Framework’s author Chris McCord works there), and Neon due to its serverless nature.

Globally Distributed Infra

With Fly, the infrastructure automatically becomes globally distributed as soon as I started provisioning servers in more than one region. As of the time of writing, Persumi is deployed to US West, Australia and EU.

Despite being simple to use, making Fly work initially actually took quite a bit of finessing due to its incomplete official documentation and flakiness. Some of the services were having issues during the course of my MVP development. Worse, they don’t report (or sometimes even acknowledge) the issues unless they are region-wide outages. To this date, I believe their blue/green deployment strategy which was recently introduced, is still buggy, I often have to use their rolling deployment strategy instead. Deployment logs were provided to Fly but I think they’re too busy with other things…

Still, I’m sticking with them for now due to the ease of use after the initial hurdle, and their globally distributed infrastructure without asking for my kidney.

To augment Fly’s web servers, I also use Cloudflare’s CDN as well as R2 to serve asset files and audio files.

Funny tangent, initially I used Bunny for asset files and CDN, as I misread Cloudflare’s terms and thought I couldn’t serve audio files from Cloudflare. Bunny worked okay but their dashboard for some reason was painfully slow - not a good look for a CDN company. Like the search engine switch, it didn’t take me too long to switch over to Cloudflare.

Serverless Postgres

There are a few options to run Postgres:

Run on a standard server for maximum portability, but it requires more server maintenance overhead
Run on AWS RDS/Aurora or a similar managed service, easy but can be costly
Run on a serverless option such as Aurora Serverless or Neon

For my use case, I think option 2 or 3 are better fitting. As I mentioned earlier, I started the experiment with Neon.

Neon worked well initially, until I started deploying Fly instances in multiple regions. Due to Neon being only available in one region (I chose US West), and I live in Australia, the round trips between Fly’s Australian instance and Neon’s US instance were a show stopper - especially when complex DB transactions were involved. Actions sometimes took seconds to complete, yikes.

Despite Fly not offering a managed Postgres service, I ended up trying it anyway due to its distributed nature. After incorporating Fly Postgres in the app, all DB operations immediately became more responsive. Paired with LiveView, it feels like running the application locally.

The current Persumi infra looks like:

1 x Fly instance in US West, always on
1 x Fly instance in Australia, auto-shutdown when there’s no traffic
1 x Fly instance in Netherlands, auto-shutdown when there’s no traffic
1 x Fly Postgres writer instance in US West, always on
1 x Fly Postgres read replica instance in Australia, always on
1 x Fly Postgres read replica instance in Netherlands, always on

With this setup, I think I’m quite happy with the cost and scalability balance - it costs ~$20/m to run, with the potential of both vertical and horizontal scaling with ease.

The Missteps

The search engine and CDN swaps mentioned earlier certainly took away some of my time, but they were nothing compared to a major misstep I encountered.

And that was: the choice of how machine learning is done.

Let me explain.

Machine Learning, and Inference

Even before I started the first line of code, I already painted a picture in my head on the machine learning needed: a TTS (text-to-speech) model that I could run inference locally on the instance.

The reason being I believed it was the more flexible approach to gradually improve the inference and therefore the end result by training my own AI models over time.

Given I didn’t want to rent expensive GPU instances, I opted for fast TTS models that could do near real-time inference on CPUs. I used Coqui TTS.

The resulting out-of-box audio wasn’t great, but I kept pressing on.

The show stopper came when it was time to deploy everything onto Fly. Due to Fly’s architecture (they deploy small-ish Docker images, < 2GB each, onto their global network), I struggled to keep the Docker image file small enough to be able to deploy. With Coqui TTS, I would need Python and all the dependencies that resulted in a Docker image around 4-5GB in size.

With my tunnel vision, I then chose to offload the entire Python and Coqui TTS dependency tree onto Fly’s persistent volumes. I knew it wasn’t a great option, as that meant my infrastructure (other than the database) was no longer immutable.

Sometimes it’s necessary to take a step back, re-evaluate, and then press on in a different direction. Which thankfully I did.

The new direction is quite simple really: instead of performing inference locally, use an external service instead.

After doing a quick comparison between the offerings from AWS, Azure and GCP, I ended up using Google’s TTS. Honestly I think I would’ve been happy with any of the options, they all seem to have decent neural based TTS.

In hindsight, these giant corporations have much more resources and expertise to train better models than I ever could on my own.

The end result:

The TTS sounds significantly better than before
It’s just as cheap to run (Google offers a certain amount of free TTS API calls per month)
It no longer needs complex Python calls and FFmpeg calls to make local TTS work
The Fly infrastructure is simple and immutable again

In hindsight, I never should’ve even entertained the idea of running ML locally on CPUs, no matter how simple and efficient a model might be.

That said, with TTS, it wasn’t as simple as just calling the APIs and getting the perfect resulting audio back. Some pre and post processing were needed, but that’s a topic for another time.

More Machine Learning

The cherry on top - now that Google’s APIs were integrated into the app, I ended up also using Google’s PaLM 2 to do text summarisation (it was initially done locally too) as well as for a ChatGPT-like AI prompt service, to power Persumi’s AI writing assistance feature.

The Closing

If you read this far, thank you! I hope you enjoyed reading (or listening) to this post. Please look around and kick tyres, I would love your feedback on how to improve Persumi.

]]>