<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <link>http://persumi.com/c/product-builders/t/ai</link>
    <generator>Persumi - Level up your writing and blogging with AI</generator>
    <category>ai</category>
    <category>Product Builders</category>
    <pubDate>Tue, 14 Apr 2026 23:10:30 +0000</pubDate>
    <description>Are you a product builder, an indie hacker or an entrepreneur? Join us, tell your stories around the campfire.</description>
    <title>Product Builders - Communities</title>
    <atom:link type="application/rss+xml" rel="self" href="http://persumi.com/c/product-builders/t/ai/feed/rss"></atom:link>
    <item>
      <pubDate>Sat, 06 Jan 2024 13:10:47 +0000</pubDate>
      <guid>http://persumi.com/c/product-builders/u/fredwu/p/introducing-rizz-farm-an-ai-assisted-lead-generation-tool-for-reddit</guid>
      <comments>http://persumi.com/c/product-builders/u/fredwu/p/introducing-rizz-farm-an-ai-assisted-lead-generation-tool-for-reddit</comments>
      <category>ai</category>
      <category>Product Builders</category>
      <author>ifredwu@gmail.com (Fred Wu)</author>
      <description>&lt;![CDATA[&lt;p&gt;
After having soft launched &lt;a href=&quot;https://persumi.com&quot;&gt;Persumi&lt;/a&gt; in 2023, I quickly found myself wanting to explore better ways to promote the product. Unfortunately, most of digital marketing products are focused on three main areas:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
Keyword bidding, e.g. Google Ads or Facebook Ads  &lt;/li&gt;
  &lt;li&gt;
Cold outreach, e.g. Apollo  &lt;/li&gt;
  &lt;li&gt;
Search Engine Optimisation (SEO)  &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
They each has its own pros and cons. As an indie developer without a huge budget, none of these tools work effectively.&lt;/p&gt;
&lt;h2&gt;
The Challenges&lt;/h2&gt;
&lt;h3&gt;
Keyword bidding and targeted ads are expensive&lt;/h3&gt;
&lt;p&gt;
Bidding wars drive up the cost of targeted ads, sometimes so significantly they price out many small businesses.&lt;/p&gt;
&lt;p&gt;
Users are increasingly using adblockers, making targeted ads ineffective.&lt;/p&gt;
&lt;h3&gt;
Cold outreach are rarely answered&lt;/h3&gt;
&lt;p&gt;
Highly sought-after leads are bombarded with cold emails every day, making the tactic ineffective.&lt;/p&gt;
&lt;p&gt;
Most cold emails end up in users’ spam bin, never to be seen again.&lt;/p&gt;
&lt;h3&gt;
Search Engine Optimisation is a lost cause&lt;/h3&gt;
&lt;p&gt;
SEO’ed content are being produced hastily, making them low quality and untrustworthy.&lt;/p&gt;
&lt;p&gt;
Sophisticated users have learnt to ignore highly SEO’ed content from untrusted sources.&lt;/p&gt;
&lt;h2&gt;
The Epiphany&lt;/h2&gt;
&lt;p&gt;
Reflecting on my own behaviour in choosing a product or service, I realised that over the years I have developed a good sensory to weed out the noise. Namely, I am very skeptical with highly ranked websites, I always use an adblocker (uBlock) to block annoying ads, and both my personal and work email inboxes are overflowed with cold emails I never read them.&lt;/p&gt;
&lt;p&gt;
So how do I do research, say when I’m buying a new TV, or subscribing to a new software package? I read user reviews. And not just questionable reviews on Amazon or review sites, but on Reddit, where (mostly) real users discuss and review products and services.&lt;/p&gt;
&lt;p&gt;
Of course, like any other place, Reddit can be filled with bots and fake content too. However, in my experience these are easily distinguishable, especially paired with content’s upvotes and user’s post history. With Reddit’s popularity, this makes it a highly desirable place for people to both look for answers, as well as to provide answers.&lt;/p&gt;
&lt;p&gt;
So I asked myself, what if there was a smart lead generation tool to help me promote my product on Reddit? What should the tool be capable of?&lt;/p&gt;
&lt;h2&gt;
The Solution&lt;/h2&gt;
&lt;p&gt;
In order to really stand out, and provide tangible value, I’ve focused on the following areas for the lead gen tool.&lt;/p&gt;
&lt;h3&gt;
Warm leads over cold leads&lt;/h3&gt;
&lt;p&gt;
By searching for and engaging with relevant content, the leads the tool finds for you are warm and targeted.&lt;/p&gt;
&lt;p&gt;
Many of the leads are already highly ranked on search engines, further boosting the content’s reach.&lt;/p&gt;
&lt;h3&gt;
Personalised and public&lt;/h3&gt;
&lt;p&gt;
Content the tool produces for the user are context-aware and personalised, making them highly relevant and trustworthy.&lt;/p&gt;
&lt;p&gt;
Even better, the content are made available publicly, boosting the user’s reputation and reach even further over time.&lt;/p&gt;
&lt;h3&gt;
Infinite scale at a fixed cost&lt;/h3&gt;
&lt;p&gt;
The tool finds leads and suggest responses for the user 24/7 non-stop, at a fraction of the usual marketing costs.&lt;/p&gt;
&lt;p&gt;
The pricing is fixed and transparent, so the user can scale their business without worrying about blowing the budget.&lt;/p&gt;
&lt;h2&gt;
The Bottom Line&lt;/h2&gt;
&lt;p&gt;
Ultimately, I want a lead generation tool that really provides value to not only its user (the marketer), but its user’s users (the target audience). Instead of focusing on &lt;strong&gt;selling&lt;/strong&gt;, it should focus on &lt;strong&gt;helping people&lt;/strong&gt;. This is where &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; comes in.&lt;/p&gt;
&lt;h2&gt;
How &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; Works&lt;/h2&gt;
&lt;p&gt;
With &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt;, machine learning is put to good use. It’s not just another AI wrapper, it actually leverages many of the latest AI innovation to really take lead generation to the next level.&lt;/p&gt;
&lt;h3&gt;
Smart search&lt;/h3&gt;
&lt;p&gt;
&lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; widens the search radius to find the most relevant or highly ranked content for the user. It may find leads a simple Google search couldn’t. Behind the scenes, keywords are expanded and multiple search queries are submitted to search engines.&lt;/p&gt;
&lt;h3&gt;
Continuous monitor&lt;/h3&gt;
&lt;p&gt;
&lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; continuously monitors the social media platforms (Reddit to start off with) to find new leads 24/7, delivering them right to the user’s inbox. Whether they are highly ranked popular posts or the latest post in a particular subreddit, &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; is able to quickly deliver them to the inbox without fuss.&lt;/p&gt;
&lt;h3&gt;
Leads inbox, as simple as Gmail&lt;/h3&gt;
&lt;p&gt;
Leads are automatically sent to the user’s inbox, where they can review and decide whether to engage with them. It’s as simple as using email.&lt;/p&gt;
&lt;h3&gt;
“Rizz Score”&lt;/h3&gt;
&lt;p&gt;
As mentioned earlier, the tools should focus on helping people, rather than hard selling. To make it easier for the user, &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; calculates a score based on how relevant, helpful and spammy the generated or user edited response is. This ensures it always publishes high quality content for the target audience, making it a win-win.&lt;/p&gt;
&lt;h3&gt;
Self-Learning AI&lt;/h3&gt;
&lt;p&gt;
Last but not least, by using state of the art AI technologies that adapts to the users, &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; is able to help them automate and scale their lead generation and reputation building with ease.&lt;/p&gt;
&lt;h2&gt;
Taking It For a Spin&lt;/h2&gt;
&lt;p&gt;
As soon as I’ve deployed &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; to production, I started using it to promote &lt;a href=&quot;https://persumi.com&quot;&gt;Persumi&lt;/a&gt; and dare I say I am very happy and impressed with its results. Not only does it find relevant leads right away, it drafts up responses that are super helpful and relevant to the OP’s questions or posts, and is very subtle in pushing the “promotion” agenda. I know a lot of subreddits shun or outright ban self-promotion, so it’s extremely important to have helpful posts.&lt;/p&gt;
&lt;p&gt;
If you are looking for an affordable way to promote your product, service, event or anything really, please give &lt;a href=&quot;https://rizz.farm&quot;&gt;Rizz.farm&lt;/a&gt; a try. It comes with 7 days of free trial and 30 days of no-question-asked money back guarantee. For a limited time, you can also get 50% off for six months using the coupon code &lt;code class=&quot;inline&quot;&gt;LAUNCH2024&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;
Happy lead generation and growth hacking everyone!&lt;/p&gt;
]]&gt;</description>
      <link>http://persumi.com/c/product-builders/u/fredwu/p/introducing-rizz-farm-an-ai-assisted-lead-generation-tool-for-reddit</link>
      <title>Introducing Rizz.farm - An AI-Assisted Lead Generation Tool for Reddit</title>
    </item>
    <item>
      <pubDate>Sun, 17 Sep 2023 03:44:24 +0000</pubDate>
      <guid>http://persumi.com/c/product-builders/u/fredwu/p/comparison-of-ai-ocr-tools-microsoft-azure-ai-document-intelligence-google-cloud-document-ai-aws-textract-and-others</guid>
      <comments>http://persumi.com/c/product-builders/u/fredwu/p/comparison-of-ai-ocr-tools-microsoft-azure-ai-document-intelligence-google-cloud-document-ai-aws-textract-and-others</comments>
      <category>ai</category>
      <category>Product Builders</category>
      <author>ifredwu@gmail.com (Fred Wu)</author>
      <description>&lt;![CDATA[&lt;p&gt;
For a project I’m working on, we are looking at different AI OCR (Optical Character Recognition) options that would allow us to import documents with various layouts and extract relevant data from them with high enough accuracy. Due to the nature of these documents and the information contained in them, it is paramount that there would be an easy way for us to train the AI models using our documents.&lt;/p&gt;
&lt;p&gt;
Without revealing exactly what types of documents we are working with, as they’re commercially sensitive, the basic premise is to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
Import a document that might be one or more pages, the document itself may be high quality PDF, or low quality scanned documents that may or may not be skewed  &lt;/li&gt;
  &lt;li&gt;
Each page may or may not contain relevant information we want to extract  &lt;/li&gt;
  &lt;li&gt;
Relevant information may be structured in different ways, but usually in a tabular form (but each “line item” can either be horizontal or vertical)  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
In our case, a traditional OCR solution wouldn’t work as the documents often contain heaps of irrelevant data, and for the relevant data, it needs to understand the context in order to work out which data elements to extract, and how they relate to each other. Therefore, a machine learning based OCR solution that could adopt to our documents is highly desirable.&lt;/p&gt;
&lt;p&gt;
The success of the project is measured by the end results of imported data:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
  How much data is detected and recognised  &lt;/li&gt;
  &lt;li&gt;
  How accurate the recognised data elements are  &lt;/li&gt;
  &lt;li&gt;
  At a minimum, 80%+ accuracy is needed in order to save time in manual data entry, and to reduce/avoid incorrect data from being checked by a human operator before entering the system  &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
Available OCR Tools&lt;/h2&gt;
&lt;p&gt;
There are a number of OCR tools available, for our project the minimum criteria are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
AI powered solution so it can be further developed and tweaked by us for our documents  &lt;/li&gt;
  &lt;li&gt;
A UI solution for data training, preferably easy to use so that non-technical team members can help with data training too  &lt;/li&gt;
  &lt;li&gt;
A quick and easy way to test and verify trained models  &lt;/li&gt;
  &lt;li&gt;
A solution that is compliant with the ISO standards relevant for us  &lt;/li&gt;
  &lt;li&gt;
Pay by usage pricing model that’s not cost prohibitive  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
The tools I have discovered and considered were:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
AWS Textract  &lt;/li&gt;
  &lt;li&gt;
Microsoft Azure Document Intelligence (formerly known as Azure Form Recognizer)  &lt;/li&gt;
  &lt;li&gt;
Google Cloud Document AI  &lt;/li&gt;
  &lt;li&gt;
Rossum.ai  &lt;/li&gt;
  &lt;li&gt;
Super.ai  &lt;/li&gt;
  &lt;li&gt;
Eden.ai  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
All six options are AI based OCR solutions. Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) are the obvious “big three”, having their own AI models, whereas Super and Eden are aggregators that farms AI calls to providers like AWS, Azure and Google, and provides a middle layer that could potentially make the end result better (or worse). It’s unclear what model Rossum uses.&lt;/p&gt;
&lt;p&gt;
After some consideration, Rossum, Super and Eden were all taken off the table due to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
  Rossum and Super not having transparent pricing  &lt;/li&gt;
  &lt;li&gt;
  Eden being an API aggregator built by a small team, it’s unclear on its usefulness and longevity  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
With the longevity of the company and product in mind, as well as transparent pricing and commercial support, AWS, Azure and Google remain to be the leading choices.&lt;/p&gt;
&lt;h2&gt;
The Big Three: AWS, Azure and Google&lt;/h2&gt;
&lt;h3&gt;
AWS Textract&lt;/h3&gt;
&lt;p&gt;
Out of the big three, AWS is our preferred vendor due to its market leading position as a cloud service provider, as well as our existing projects already using AWS.&lt;/p&gt;
&lt;p&gt;
However, upon closer inspection, it appears Textract &lt;a href=&quot;https://stackoverflow.com/questions/68044070/how-to-customise-aws-textract&quot;&gt;does not allow custom data training&lt;/a&gt;. Textract is provided “as-is” using Amazon’s pre-trained models, it does not allow customers like us to provide their own training data to improve on the detection and recognition of different types of documents.&lt;/p&gt;
&lt;p&gt;
AWS’ machine learning blog has published an article on “&lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/building-an-end-to-end-intelligent-document-processing-solution-using-aws/&quot;&gt;end-to-end intelligent document processing solution&lt;/a&gt;“ in 2020, however it does not explain in detail exactly what the capabilities are, and the solution requires a complex architecture that needs significant amount of time and effort to set up and maintain.&lt;/p&gt;
&lt;h3&gt;
Azure Document Intelligence&lt;/h3&gt;
&lt;p&gt;
Azure’s offering, formerly known as Form Recognizer, is an end-to-end solution that offers &lt;a href=&quot;https://learn.microsoft.com/en-au/azure/ai-services/document-intelligence/concept-custom&quot;&gt;custom data labelling and training&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
Google Cloud Document AI&lt;/h3&gt;
&lt;p&gt;
Google’s offering, similar to Azure’s, is also an end-to-end solution that offers &lt;a href=&quot;https://cloud.google.com/document-ai/docs/workbench/build-custom-processor&quot;&gt;custom data labelling and training&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
Comparison: Azure vs Google&lt;/h2&gt;
&lt;p&gt;
With the initial assessment done, we are left with Azure and Google, so let’s dive into the deep comparison of the two platforms.&lt;/p&gt;
&lt;p&gt;
We will compare the two platforms in several aspects:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
The ease of initial set up, i.e. how quickly can we get up and running right away  &lt;/li&gt;
  &lt;li&gt;
Whether a “base” pre-trained model is provided for documents with high enough accuracy so we save time in customisation and training  &lt;/li&gt;
  &lt;li&gt;
The detection and recognition – how much data is accurately detected and recognised  &lt;/li&gt;
  &lt;li&gt;
The custom training process, how easy is it to label custom data  &lt;/li&gt;
  &lt;li&gt;
How fast or slow the training is  &lt;/li&gt;
  &lt;li&gt;
The end result – provided with the same training dataset, which platform offers better results  &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
The Initial Setup&lt;/h3&gt;
&lt;h4&gt;
Microsoft Azure&lt;/h4&gt;
&lt;p&gt;
Azure’s initial set up is relatively straightforward. There is a four-step dialog window to set up the required Azure resources, etc. After which, documents can simply be uploaded for immediate consumption - labelling and training, which will be touched on in later sections.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/08d15780-8791-4671-b162-0261dc1bbf65.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/137edab2-4967-426a-b459-4f0760526274.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h4&gt;
Google Cloud&lt;/h4&gt;
&lt;p&gt;
Google works slightly differently in this case. To set up a project it’s very simple, there is literally only the “Processor name” to fill out. However, it does require a separate step to set up the storage bucket required.&lt;/p&gt;
&lt;p&gt;
Also, instead of allowing files to be uploaded within the same UI like Azure, Google requires files to be uploaded to the cloud storage bucket first, then imported into Document AI.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/123124bc-8fc5-4204-9cd8-50385496603f.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/a0eced5c-9c94-4a62-8322-3f9ca3eae8c1.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h3&gt;
Auto Labelling Data&lt;/h3&gt;
&lt;h4&gt;
Microsoft Azure&lt;/h4&gt;
&lt;p&gt;
“Auto-labelling” refers to the AI OCR’s ability to do the initial labelling automatically, to save time and effort in the manual labelling work. It can be seen as the starting point of training the AI model.&lt;/p&gt;
&lt;p&gt;
In this case, the Azure experience is significantly better. It allows files to be uploaded, then auto-labelled by “current”, “unlabelled” or “all” documents. This gives the flexibility to users (i.e. us) to choose when a document should be auto-labelled – as it does take a bit of time, sometimes if a document is not very well structured and would require more manual labelling anyway, then it’s best to skip the auto-label step.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/380e737f-29ba-4b37-bdc6-809e2b308cc3.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h4&gt;
Google Cloud&lt;/h4&gt;
&lt;p&gt;
Google on the other hand, has a more cumbersome process. We first need to upload the files to the storage bucket, as mentioned before. Then, we select the path/folder of the uploaded files, and the “auto-labelling” is enabled or disabled for the entire import. Also, sometimes the import takes a long time, whereas on Azure it’s always instant.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/5e4f3304-92b0-42de-bde1-fc1fd4eb9a06.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
More on the accuracy of auto-labelling later.&lt;/p&gt;
&lt;h3&gt;
Text Detection and Recognition&lt;/h3&gt;
&lt;p&gt;
A big part of OCR is obviously text detection and recognition. Even without training, a good OCR solution should detect and recognise clear text effortlessly.&lt;/p&gt;
&lt;p&gt;
During our initial research, we’ve found &lt;a href=&quot;https://cazton.com/blogs/executive/handwriting-recognition-azure-aws-gcp&quot;&gt;this article&lt;/a&gt; that compared the text recognition accuracy between AWS, Azure and Google. Their finding was that Azure and Google were comparable with a slight edge to Azure (Azure performed better on 2 out of 3 documents, and Google performed better on 1 out of 3 documents), both were way ahead of AWS in terms of accuracy.&lt;/p&gt;
&lt;p&gt;
Our own testing has somewhat mirrored their experience. We found that in general, Azure performed better in the number of detected text elements, as well as the correctness of these text.&lt;/p&gt;
&lt;p&gt;
On Azure, it’s very rare for text to be undetectable, whereas on Google, we’ve found several instances where the text is very clear, yet Google was unable to detect the text at all.&lt;/p&gt;
&lt;p&gt;
Similarly on accuracy, Azure has rarely detected clear text incorrectly, whereas on Google, a lot of times units such as &lt;code class=&quot;inline&quot;&gt;x10*9/L&lt;/code&gt; get recognised incorrectly despite all instances appearing similar in their appearance (i.e. very clear), and sometimes dashes won’t get recognised (e.g. &lt;code class=&quot;inline&quot;&gt;10.0 - 12.50&lt;/code&gt; gets recognised as &lt;code class=&quot;inline&quot;&gt;10.0 12.50&lt;/code&gt;).&lt;/p&gt;
&lt;h3&gt;
Custom Labelling&lt;/h3&gt;
&lt;p&gt;
During training, a big portion of time is spent on custom labelling, meaning we look for and assign text to our pre-defined data elements such as name and occupation - these are just fictional examples.&lt;/p&gt;
&lt;p&gt;
Overall, Azure and Google each has its strengths and weaknesses. Let’s go through them in detail.&lt;/p&gt;
&lt;h4&gt;
Auto Layout&lt;/h4&gt;
&lt;p&gt;
On Azure, there is a “Run layout” step that would recognise all the text elements, as well as tabular data in a given document.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/dfcad327-d48d-4a66-be65-8a4661ffa5af.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Once “run layout” is performed, all the recognised text are highlighted in yellow, giving a quick and easy overview of the usable data elements.&lt;/p&gt;
&lt;p&gt;
Google on the other hand, does not offer a similar function, therefore labelling needs to be performed with more effort, more on this later.&lt;/p&gt;
&lt;h3&gt;
Schema vs Schema-less&lt;/h3&gt;
&lt;p&gt;
One key difference between Azure and Google’s solutions, is that the Azure solution is schema-less whereas the Google solution is based on defined schemas.&lt;/p&gt;
&lt;p&gt;
On Azure, we can add a “field”, and then assign the field a type (string, number, etc). A field can then be renamed or re-assigned with a different type at any time.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/940cc894-20ba-4d22-8ecc-3b156adf1859.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/1514d2c8-1ed0-49ca-8669-68318db39e1a.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
There is a special field called “table field”, we can create tabular data using this field type.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/577d29d1-ff02-4eff-959c-70f5cd1a898b.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/9875641e-469a-4511-9230-fb949661f655.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Google works very differently. Instead of adding fields on the fly, Google requires a schema (a.k.a. field definitions) to be created first. Once a field is created, and data trained, it cannot be edited or deleted.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/00a08a47-6ff7-41a0-b970-9ece1642237a.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
One advantage of Google’s approach is a more definitive structure of OCR’ed data. When creating a label, not only do we have to choose a type (similar to Azure), we also can choose its occurrence logic, see the screenshot below.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/6390ae94-f3dc-4a96-8e5a-331f0e144032.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Despite this, we found that the schema-less approach on Azure offers far better versatility without being confined to a pre-defined schema, making training new revisions of models far easier.&lt;/p&gt;
&lt;h3&gt;
Labelling Data&lt;/h3&gt;
&lt;p&gt;
Most of the time spent on training, is the manual labelling of data elements on the documents. Azure’s approach in this case is significantly faster and more accurate, compared to Google’s.&lt;/p&gt;
&lt;h4&gt;
Microsoft Azure&lt;/h4&gt;
&lt;p&gt;
On Azure, because of the “run layout” step, all text elements are already detected. So labelling them as part of the test results is very simple, you click on the yellow elements to select the text you want, then click on the table cell on the right to assign them.&lt;/p&gt;
&lt;p&gt;
However, if somehow Azure detects the data incorrectly, there is no way for us to provide a correction. In this case, we could choose to either skip the cell, or simply assign the cell with the incorrect data – if more training data is provided over time then these edge cases will not have a big impact on the final accuracy of the model.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/d0a08e48-73a1-4757-9028-ef566773abe8.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
In the screenshot above, sensitive data elements are blacked out for the purpose of this blog post.&lt;/p&gt;
&lt;h4&gt;
Google Cloud&lt;/h4&gt;
&lt;p&gt;
On Google, this process is significantly more involved. Due to not having a “run layout”-like step, we need to draw over the text to select it, and there’s no way to tell whether Google has detected the text before using the tool, so sometimes you’ll draw over some text and get an error saying, “Cannot create labels with empty values”.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/f858e169-3f1b-47b5-a9cf-f7c5dc836d83.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Again, in the screenshot above, sensitive data elements are blacked out for the purpose of this blog post.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/f057079c-bd70-490d-9e8f-3c72bdd5fd8f.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
And because we can’t simply click on the highlighted text already detected, like on Azure, this makes the whole process significantly longer as we have to carefully draw over the text to ensure the boundary is accurate (for training purpose), sometimes we have to zoom in and reposition the document for accurate drawing.&lt;/p&gt;
&lt;p&gt;
The schema assignment is also less intuitive compared to Azure’s. Instead of having a table to easily see tabular data, Google’s schema is more like a series of key-value pairs, making glancing at assigned data much harder, and the key-value elements are ordered alphabetically, instead of logically like on Azure, making assigning data unnatural.&lt;/p&gt;
&lt;p&gt;
The one advantage of Google’s approach though, is that when the recognised data is incorrect, we can manually override it with our own correct data. However, after using both solutions for a while, we’ve found that Google’s has a much higher error rate compared to Azure’s to begin with.&lt;/p&gt;
&lt;h3&gt;
Auto-Label Accuracy&lt;/h3&gt;
&lt;p&gt;
In order to assess the auto-labelling accuracy, we have trained several of our documents with varying quality and layout. We then uploaded a new document with a layout that has not been trained.&lt;/p&gt;
&lt;p&gt;
The results are very telling.&lt;/p&gt;
&lt;h4&gt;
Microsoft Azure&lt;/h4&gt;
&lt;p&gt;
Azure has pretty much labelled things perfectly - all the data elements were detected correctly and within the correct contexts. The second comment block is also detected correctly.&lt;/p&gt;
&lt;p&gt;
In this case, the Azure auto-label has only missed the first comment block.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/27360181-ec22-46df-abc1-215124f568cf.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
The screenshot above is blurred to protect the sensitive nature of it.&lt;/p&gt;
&lt;h4&gt;
Google Cloud&lt;/h4&gt;
&lt;p&gt;
Google’s result unfortunately falls short of expectation.&lt;/p&gt;
&lt;p&gt;
As seen in the screenshot below, Google has missed several data elements, and incorrectly detected elements that should not be part of the result. Again, the screenshot is blurred to protect the sensitive nature of it.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/b0fdd124-69a1-4df6-8008-401c03cc02eb.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
This suggests two possible things about Google’s OCR solution:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
  The default text recognition is poor, therefore it’s missing data elements even though they are clearly presented, in the same way as other elements  &lt;/li&gt;
  &lt;li&gt;
  The neural network for language models is poor, therefore it incorrectly detected text that shouldn’t be part of any result  &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
Auto-Label Result Verification&lt;/h3&gt;
&lt;h4&gt;
Microsoft Azure&lt;/h4&gt;
&lt;p&gt;
Once Azure’s performed the auto-labelling, it’s very easy to check the results, as seen in the blurred screenshot below, we simply look at the table to ensure the correctness. However, as mentioned before, we can only correct detected regions of text, not the content itself.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/28acf5ba-e055-4269-989b-bc144a97f0a0.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h4&gt;
Google Cloud&lt;/h4&gt;
&lt;p&gt;
On Google, it is more involved. As seen in the blurred screenshot below, it requires a lot of scrolling to find the auto-labelled data. Alternatively, we can hover over the highlighted text to look at the auto-labelled text, although in this case, the pop-up gets in the way of other data which makes it an annoyance at times.&lt;/p&gt;
&lt;p&gt;
Curiously, despite having a schema, and setting certain elements to “required once” (per schema), Google still went ahead and falsely detected multiple results per schema.&lt;/p&gt;
&lt;p&gt;
But as mentioned before, Google does offer the ability to manually correct the text content, which is a big plus.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/6602041a-1c8a-4eb4-8ccd-d0c6008ca98c.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h3&gt;
Data Training Speed&lt;/h3&gt;
&lt;p&gt;
We have found training to be significantly faster on Azure.&lt;/p&gt;
&lt;p&gt;
With about 45 documents, it takes about 30 minutes to train on Azure.&lt;/p&gt;
&lt;p&gt;
Google takes about twice as long, about an hour to train, and another several minutes to deploy the trained model. Azure does not need a separate deployment step.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/9ef5c376-5136-4e88-a989-0c1e5e244833.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;
Interestingly, on Google, training a model requires having enough test data. So documents need to be split into training and test groups. It is unclear whether the test data is used for actual training.&lt;/p&gt;
&lt;p&gt;
There is also a required number of labelled elements (10 minimum, 50 recommended) before a model can be trained. Azure has no such limitations.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&quot;https://cdn.persumi.com/uploads/images/posts/1ee22517-8bfc-676a-b1f2-ce61dc92750f/images/0b1b9f24-76d5-4b5b-8997-6982e0d17fe5.png&quot; alt=&quot;&quot; /&gt;
&lt;/p&gt;
&lt;h3&gt;
Data Regions and Compliance&lt;/h3&gt;
&lt;p&gt;
While both Azure and Google are compliant to many standards (ISO, GDPR, etc), Google’s Document AI can only be hosted out of their US or EU regions. Whereas Azure has no such limitation and can be hosted in any of their available regions.&lt;/p&gt;
&lt;p&gt;
This has a side effect on the performance of UI operations. Azure is very quick and snappy due to it being deployed in our local AU region, whereas Google due to it being deployed in the US region, is a little slow every time you open and close a document for example.&lt;/p&gt;
&lt;h2&gt;
Conclusion&lt;/h2&gt;
&lt;p&gt;
With the in-depth analysis done, it is no surprise that in the end we went with Microsoft Azure’s Document Intelligence for our AI OCR needs. I hope these findings are useful to other people too.&lt;/p&gt;
]]&gt;</description>
      <link>http://persumi.com/c/product-builders/u/fredwu/p/comparison-of-ai-ocr-tools-microsoft-azure-ai-document-intelligence-google-cloud-document-ai-aws-textract-and-others</link>
      <title>Comparison of AI OCR Tools: Microsoft Azure AI Document Intelligence, Google Cloud Document AI, AWS Textract and Others</title>
    </item>
    <item>
      <pubDate>Wed, 09 Aug 2023 09:28:12 +0000</pubDate>
      <guid>http://persumi.com/c/persumi/u/fredwu/p/how-i-built-a-mostly-feature-complete-mvp-in-3-months-whilst-working-full-time</guid>
      <comments>http://persumi.com/c/persumi/u/fredwu/p/how-i-built-a-mostly-feature-complete-mvp-in-3-months-whilst-working-full-time</comments>
      <category>ai</category>
      <category>Product Builders</category>
      <author>ifredwu@gmail.com (Fred Wu)</author>
      <description>&lt;![CDATA[&lt;p&gt;
A few weeks ago I &lt;a href=&quot;welcome-to-persumi-a-modern-platform-for-content-creation&quot;&gt;soft launched&lt;/a&gt; an MVP - you are looking at it right now.&lt;/p&gt;
&lt;p&gt;
In this post I’ll talk about the features, the tech stack and the globally distributed infrastructure behind building this MVP, and of course, with a sprinkle of learnings too.&lt;/p&gt;
&lt;h2&gt;
The “MVP”&lt;/h2&gt;
&lt;p&gt;
Deciding what makes up an “MVP” is always interesting - I’ve heard some saying if the product &lt;em&gt;isn’t&lt;/em&gt; embarrassing you’re releasing it too late, and also if the product &lt;em&gt;is&lt;/em&gt; embarrassing, you’re not gonna make it.&lt;/p&gt;
&lt;p&gt;
For me, I’ve always had the idea of building all the essential features as part of the MVP, with one or two “hero” features that would differentiate the product from the competitions, and then build out more premium features over time.&lt;/p&gt;
&lt;p&gt;
If you can’t already tell, Persumi is a content creation platform with some social networking features. It may sound bland, but what I believe makes it stand out, is the desire of putting the focus back onto the content, rather than the VC-fuelled, ever increasing appetite for more ads and user hostile features.&lt;/p&gt;
&lt;h2&gt;
The Long Nights and Weekends&lt;/h2&gt;
&lt;p&gt;
There is no magic beans for productivity - especially when I have a full time job. Working on a side hustle means giving up on almost all social and entertainment activities. It’s not for everyone, but I didn’t mind it too much. Being an introvert definitely helped - I was happy to see night by night the MVP gradually taking shape to become more and more real.&lt;/p&gt;
&lt;p&gt;
Looking back, I spent about three months to build out most of the MVP, then another week or two on infrastructure, and another week or two for polishing, all whilst having a full time job.&lt;/p&gt;
&lt;p&gt;
It’s been a journey, I’m glad that it “only” took me 3-4 months to get to this stage, as initially I estimated for a 6+ months MVP build.&lt;/p&gt;
&lt;h2&gt;
The Features&lt;/h2&gt;
&lt;p&gt;
With all that in mind, I’ve set out to build the essential features that make a blogging and social networking platform:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
Short form content like a tweet  &lt;/li&gt;
  &lt;li&gt;
Long form content like a blog post or a book chapter  &lt;/li&gt;
  &lt;li&gt;
RSS feeds  &lt;/li&gt;
  &lt;li&gt;
Communities similar to forums and sub-reddits  &lt;/li&gt;
  &lt;li&gt;
Direct messaging between users  &lt;/li&gt;
  &lt;li&gt;
A voting (like/dislike) system  &lt;/li&gt;
  &lt;li&gt;
A bunch of CRUD glue pieces to make all these things work  &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
The “Hero” Features&lt;/h3&gt;
&lt;p&gt;
Beyond these seemly unremarkable features, I’ve also had in mind two key features that would differentiate the platform from the rest:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
The “persona” concept, whereby each user is allowed to have multiple personas to hold different content or topics of interest, e.g. a persona for professional stuff, a persona for gaming stuff and a persona for travel stuff, etc  &lt;/li&gt;
  &lt;li&gt;
AI generated audio content for text (also known as Text-to-Speech)  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
These two “hero” features are what drove me to build Persumi in the first place. Together, they solve some very real pain points for me, namely:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
Following specific topics of interest from people is difficult, with the algorithms taking over people’s home feeds, there are simply way too much noise, thanks to VC-fuelled “user engagement” metrics  &lt;/li&gt;
  &lt;li&gt;
Content consumption on the go (e.g. during commute or during workouts, etc) is becoming more and more prevalent, but the traditional platforms haven’t adopted to this new lifestyle other than shoving short form content down our throats  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
There’s also a third “hero” feature: the Aura system. Unlike the upvote/downvote or like/dislike buttons in many social platforms that only serve the algorithm to push more content to you, Persumi’s Aura system keeps track of user’s content quality over time, and would punish the low quality content and promote high quality content using visual cues - lower quality content has a much lower contrast making them easy to ignore. In the age of social media, self-curating content becomes essential to keep a platform healthy, engaging and usable.&lt;/p&gt;
&lt;h3&gt;
The Non-MVP Features, a.k.a. The Future&lt;/h3&gt;
&lt;p&gt;
There are many features that didn’t make the MVP cut, most of these are value-added features that will eventually make their way into paid subscriptions - if Persumi gains enough traction to attract users who don’t mind paying for premium features.&lt;/p&gt;
&lt;p&gt;
A prime example of such paid features is ones that help users monetise their content, e.g. ad revenue sharing and paid subscribers (like Patreon).&lt;/p&gt;
&lt;p&gt;
I also have the ambition of building out Persumi’s features so it can eventually compete against the likes of LinkedIn and Tinder.&lt;/p&gt;
&lt;p&gt;
Wouldn’t it be better for the world to have a platform like Persumi that doesn’t focus on dark patterns and exploiting users? 😉&lt;/p&gt;
&lt;h2&gt;
The Tech Stack&lt;/h2&gt;
&lt;p&gt;
Over the past decade or so I’ve mainly worked with two tech stacks: Ruby and Elixir. So naturally, Persumi was going to be built using one of them.&lt;/p&gt;
&lt;p&gt;
After some consideration, I’ve decided to go ahead with Elixir, the main reasons were:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
Elixir and Erlang/OTP support distributed systems out of box  &lt;/li&gt;
  &lt;li&gt;
I’ve been writing more Elixir than Ruby lately, so I’m more productive in Elixir  &lt;/li&gt;
  &lt;li&gt;
I really wanted to try and use &lt;a href=&quot;https://github.com/phoenixframework/phoenix_live_view&quot;&gt;LiveView&lt;/a&gt; in production  &lt;/li&gt;
  &lt;li&gt;
I prefer Phoenix’s application architecture more than Rails’  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
On top of Elixir, I’ve decided early on a few other things to go with it:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
Tailwind for CSS  &lt;/li&gt;
  &lt;li&gt;
Postgres for database, preferably a serverless option  &lt;/li&gt;
  &lt;li&gt;
A search engine  &lt;/li&gt;
  &lt;li&gt;
An easy to maintain infrastructure that doesn’t cost an arm and a leg  &lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
Elixir&lt;/h3&gt;
&lt;p&gt;
I first discovered Elixir in 2014 while I was still actively involved in the Ruby and Rails communities, but it was two years later that I had the opportunity to really dive into it. I built a few open source libraries to help me learn Elixir and OTP:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
&lt;a href=&quot;https://github.com/fredwu/crawler&quot;&gt;Crawler&lt;/a&gt; - a high performance web scraper.  &lt;/li&gt;
  &lt;li&gt;
&lt;a href=&quot;https://github.com/fredwu/opq&quot;&gt;OPQ: One Pooled Queue&lt;/a&gt; - a simple, in-memory FIFO queue with back-pressure support, built for Crawler.  &lt;/li&gt;
  &lt;li&gt;
&lt;a href=&quot;https://github.com/fredwu/simple_bayes&quot;&gt;Simple Bayes&lt;/a&gt; - a Naive Bayes machine learning implementation. Hey, I was doing machine learning before it was mainstream! 😆  &lt;/li&gt;
  &lt;li&gt;
&lt;a href=&quot;https://github.com/fredwu/stemmer&quot;&gt;Stemmer&lt;/a&gt; - an English (Porter2) stemming implementation, built for Simple Bayes.  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
With a decent amount of experience building Phoenix web apps over the years, I unfortunately never had the opportunity to use LiveView…&lt;/p&gt;
&lt;p&gt;
I guess that’s finally changed now.&lt;/p&gt;
&lt;p&gt;
LiveView has been amazing, not only does it drastically reduce the amount of front-end code you have to write, make the entire web app feel super snappy, but it also virtually eliminates any front-end and back-end code/logic duplications. It is such an awesome piece of technology that improves both the user experience and the developer experience.&lt;/p&gt;
&lt;h4&gt;
Petal Pro&lt;/h4&gt;
&lt;p&gt;
During my initial tech research, I came across &lt;a href=&quot;https://petal.build/&quot;&gt;Petal Pro&lt;/a&gt; which is a boilerplate starter template built on top of Phoenix. It handles things like user authentication which almost every web app needs, but is somewhat tedious to build.&lt;/p&gt;
&lt;p&gt;
Petal Pro isn’t free but it ended up saving me so much time. I also started contributing small bug fixes and features to it too. If you are about to build something in Phoenix, check it out!&lt;/p&gt;
&lt;h3&gt;
Tailwind CSS&lt;/h3&gt;
&lt;p&gt;
As I kept progressing my career, there have been fewer and fewer opportunities for me to write front-end and CSS code. Last time I rebuilt my blog, I used &lt;a href=&quot;https://bulma.io/&quot;&gt;Bulma&lt;/a&gt; - that was 2019. Since then Tailwind has gained a lot more traction, so I wanted an excuse to try finally give it a shot.&lt;/p&gt;
&lt;p&gt;
There is a debate on how many new things you should try for building your MVP - the more you have to learn, the slower your MVP progresses. That said, given CSS is reasonably straightforward, I figured it wouldn’t slow me down too much, if anything, Tailwind’s flexibility might just eventually make up any time lost in learning.&lt;/p&gt;
&lt;p&gt;
I’m happy to report that it is indeed true - by using Tailwind, it became significantly easier for me to customise my components and elements. I can see why it became so popular. It’s not for everyone, but I like it.&lt;/p&gt;
&lt;h3&gt;
Postgres&lt;/h3&gt;
&lt;p&gt;
Choosing Postgres as a database was a no brainer, given how popular and versatile it is. I did briefly consider NoSQL options like DynamoDB but quickly wrote them off as I needed an RDBMS to get things off the ground quickly, and the DB is unlikely to be the bottleneck for a long time anyway.&lt;/p&gt;
&lt;p&gt;
In Elixir, the &lt;a href=&quot;https://hexdocs.pm/ecto/Ecto.html&quot;&gt;Ecto library&lt;/a&gt; works wonders for Postgres.&lt;/p&gt;
&lt;p&gt;
Later in the post I’ll touch on how I deploy and run Postgres in production.&lt;/p&gt;
&lt;h3&gt;
Search Engine&lt;/h3&gt;
&lt;p&gt;
For a search engine, my requirements were:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
The ability to search across multiple fields of a schema  &lt;/li&gt;
  &lt;li&gt;
The ability to rank them  &lt;/li&gt;
  &lt;li&gt;
The ability to have typo tolerance, word stemming and other similar language features to make search more intuitive  &lt;/li&gt;
  &lt;li&gt;
The ability to search multiple languages, including CJK (Chinese/Japanese/Korean) characters  &lt;/li&gt;
  &lt;li&gt;
Simple to run  &lt;/li&gt;
  &lt;li&gt;
Cheap to run  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Using Postgres’ full text search was probably going to be the simplest but it doesn’t offer all the functionalities I need without a bunch of set up and manual SQL queries so I didn’t pursue it.&lt;/p&gt;
&lt;p&gt;
Elasticsearch on the other hand, offers good search functionalities but takes a bit of effort to set up and maintain, and can be costly to run.&lt;/p&gt;
&lt;p&gt;
After doing some more research, I found the following three options that would fit my needs:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
&lt;a href=&quot;https://www.algolia.com/&quot;&gt;Algolia&lt;/a&gt;  &lt;/li&gt;
  &lt;li&gt;
&lt;a href=&quot;https://www.meilisearch.com/&quot;&gt;Meilisearch&lt;/a&gt;  &lt;/li&gt;
  &lt;li&gt;
&lt;a href=&quot;https://typesense.org/&quot;&gt;Typesense&lt;/a&gt;  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Both Meilisearch and Typesense are open source, with commercial SaaS offerings, whilst Algolia is SaaS-only.&lt;/p&gt;
&lt;p&gt;
It’s been an interesting journey. I started with Typesense as I liked what I read, but I quickly discovered that it doesn’t search Chinese characters properly.&lt;/p&gt;
&lt;p&gt;
I then turned to Meilisearch. I especially liked the fact that they offered a generous free tier SaaS to get you off the ground running. Spoiler: during my implementation they did a bait and switch and removed the free tier.&lt;/p&gt;
&lt;p&gt;
At the time the Elixir support for Meilisearch wasn’t up to date, so I ended up &lt;a href=&quot;https://github.com/nutshell-lab/meilisearch-ex/pull/5&quot;&gt;contributing to a community library&lt;/a&gt; to add the features I needed.&lt;/p&gt;
&lt;p&gt;
Curious timing, after Meilisearch removed their free tier, I discovered that even though they officially support searching for Chinese characters, the implementation wasn’t perfect. I found some edge cases where characters weren’t detected properly, making the search results unreliable.&lt;/p&gt;
&lt;p&gt;
So, my last hope was Algolia. Despite them being the more expensive option out of the three, it does offer a free tier. It turns out, their search results for Chinese characters were much better than Meilisearch’s. Luckily, re-implementing the search from Meilisearch to Algolia didn’t take too much effort, it was pretty much done in one night.&lt;/p&gt;
&lt;h3&gt;
Infrastructure&lt;/h3&gt;
&lt;p&gt;
Early on during the development I’d already determined I wanted to try &lt;a href=&quot;https://fly.io/&quot;&gt;Fly&lt;/a&gt; and &lt;a href=&quot;https://neon.tech/&quot;&gt;Neon&lt;/a&gt;, for web and DB, respectively.&lt;/p&gt;
&lt;p&gt;
I am in no way associated with either company, I was curious about Fly due to its tie-in with the Elixir community (Phoenix Framework’s author Chris McCord works there), and Neon due to its serverless nature.&lt;/p&gt;
&lt;h4&gt;
Globally Distributed Infra&lt;/h4&gt;
&lt;p&gt;
With Fly, the infrastructure automatically becomes globally distributed as soon as I started provisioning servers in more than one region. As of the time of writing, Persumi is deployed to US West, Australia and EU.&lt;/p&gt;
&lt;p&gt;
Despite being simple to use, making Fly work initially actually took quite a bit of finessing due to its incomplete official documentation and flakiness. Some of the services were having issues during the course of my MVP development. Worse, they don’t report (or sometimes even acknowledge) the issues unless they are region-wide outages. To this date, I believe their blue/green deployment strategy which was recently introduced, is still buggy, I often have to use their rolling deployment strategy instead. Deployment logs were provided to Fly but I think they’re too busy with other things…&lt;/p&gt;
&lt;p&gt;
Still, I’m sticking with them for now due to the ease of use after the initial hurdle, and their globally distributed infrastructure without asking for my kidney.&lt;/p&gt;
&lt;p&gt;
To augment Fly’s web servers, I also use Cloudflare’s &lt;a href=&quot;https://www.cloudflare.com/application-services/products/cdn/&quot;&gt;CDN&lt;/a&gt; as well as &lt;a href=&quot;https://www.cloudflare.com/developer-platform/r2/&quot;&gt;R2&lt;/a&gt; to serve asset files and audio files.&lt;/p&gt;
&lt;p&gt;
Funny tangent, initially I used &lt;a href=&quot;https://bunny.net/&quot;&gt;Bunny&lt;/a&gt; for asset files and CDN, as I misread Cloudflare’s terms and thought I couldn’t serve audio files from Cloudflare. Bunny worked okay but their dashboard for some reason was painfully slow - not a good look for a CDN company. Like the search engine switch, it didn’t take me too long to switch over to Cloudflare.&lt;/p&gt;
&lt;h4&gt;
Serverless Postgres&lt;/h4&gt;
&lt;p&gt;
There are a few options to run Postgres:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
Run on a standard server for maximum portability, but it requires more server maintenance overhead  &lt;/li&gt;
  &lt;li&gt;
Run on AWS RDS/Aurora or a similar managed service, easy but can be costly  &lt;/li&gt;
  &lt;li&gt;
Run on a serverless option such as Aurora Serverless or Neon  &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
For my use case, I think option 2 or 3 are better fitting. As I mentioned earlier, I started the experiment with &lt;a href=&quot;https://neon.tech/&quot;&gt;Neon&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
Neon worked well initially, until I started deploying Fly instances in multiple regions. Due to Neon being only available in one region (I chose US West), and I live in Australia, the round trips between Fly’s Australian instance and Neon’s US instance were a show stopper - especially when complex DB transactions were involved. Actions sometimes took &lt;em&gt;seconds&lt;/em&gt; to complete, yikes.&lt;/p&gt;
&lt;p&gt;
Despite Fly not offering a managed Postgres service, I ended up trying it anyway due to its &lt;a href=&quot;https://fly.io/docs/postgres/advanced-guides/high-availability-and-global-replication/&quot;&gt;distributed nature&lt;/a&gt;. After incorporating &lt;a href=&quot;https://github.com/superfly/fly_postgres_elixir&quot;&gt;Fly Postgres&lt;/a&gt; in the app, all DB operations immediately became more responsive. Paired with LiveView, it feels like running the application locally.&lt;/p&gt;
&lt;p&gt;
The current Persumi infra looks like:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
1 x Fly instance in US West, always on  &lt;/li&gt;
  &lt;li&gt;
1 x Fly instance in Australia, auto-shutdown when there’s no traffic  &lt;/li&gt;
  &lt;li&gt;
1 x Fly instance in Netherlands, auto-shutdown when there’s no traffic  &lt;/li&gt;
  &lt;li&gt;
1 x Fly Postgres writer instance in US West, always on  &lt;/li&gt;
  &lt;li&gt;
1 x Fly Postgres read replica instance in Australia, always on  &lt;/li&gt;
  &lt;li&gt;
1 x Fly Postgres read replica instance in Netherlands, always on  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
With this setup, I think I’m quite happy with the cost and scalability balance - it costs ~$20/m to run, with the potential of both vertical and horizontal scaling with ease.&lt;/p&gt;
&lt;h2&gt;
The Missteps&lt;/h2&gt;
&lt;p&gt;
The search engine and CDN swaps mentioned earlier certainly took away some of my time, but they were nothing compared to a major misstep I encountered.&lt;/p&gt;
&lt;p&gt;
And that was:  the choice of how machine learning is done.&lt;/p&gt;
&lt;p&gt;
Let me explain.&lt;/p&gt;
&lt;h3&gt;
Machine Learning, and Inference&lt;/h3&gt;
&lt;p&gt;
Even before I started the first line of code, I already painted a picture in my head on the machine learning needed: a TTS (text-to-speech) model that I could run inference locally on the instance.&lt;/p&gt;
&lt;p&gt;
The reason being I believed it was the more flexible approach to gradually improve the inference  and therefore the end result by training my own AI models over time.&lt;/p&gt;
&lt;p&gt;
Given I didn’t want to rent expensive GPU instances, I opted for fast TTS models that could do near real-time inference on CPUs. I used &lt;a href=&quot;https://github.com/coqui-ai/TTS&quot;&gt;Coqui TTS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
The resulting out-of-box audio wasn’t great, but I kept pressing on.&lt;/p&gt;
&lt;p&gt;
The show stopper came when it was time to deploy everything onto Fly. Due to Fly’s architecture (they deploy small-ish Docker images, &lt; 2GB each, onto their global network), I struggled to keep the Docker image file small enough to be able to deploy. With Coqui TTS, I would need Python and all the dependencies that resulted in a Docker image around 4-5GB in size.&lt;/p&gt;
&lt;p&gt;
With my tunnel vision, I then chose to offload the entire Python and Coqui TTS dependency tree onto Fly’s &lt;a href=&quot;https://fly.io/docs/reference/volumes/&quot;&gt;persistent volumes&lt;/a&gt;. I knew it wasn’t a great option, as that meant my infrastructure (other than the database) was no longer immutable.&lt;/p&gt;
&lt;p&gt;
Sometimes it’s necessary to take a step back, re-evaluate, and then press on in a different direction. Which thankfully I did.&lt;/p&gt;
&lt;p&gt;
The new direction is quite simple really: instead of performing inference locally, use an external service instead.&lt;/p&gt;
&lt;p&gt;
After doing a quick comparison between the offerings from AWS, Azure and GCP, I ended up using &lt;a href=&quot;https://cloud.google.com/text-to-speech&quot;&gt;Google’s TTS&lt;/a&gt;. Honestly I think I would’ve been happy with any of the options, they all seem to have decent neural based TTS.&lt;/p&gt;
&lt;p&gt;
In hindsight, these giant corporations have much more resources and expertise to train better models than I ever could on my own.&lt;/p&gt;
&lt;p&gt;
The end result:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
The TTS sounds significantly better than before  &lt;/li&gt;
  &lt;li&gt;
It’s just as cheap to run (Google offers a certain amount of free TTS API calls per month)  &lt;/li&gt;
  &lt;li&gt;
It no longer needs complex Python calls and FFmpeg calls to make local TTS work  &lt;/li&gt;
  &lt;li&gt;
The Fly infrastructure is simple and immutable again  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
In hindsight, I never should’ve even entertained the idea of running ML locally on CPUs, no matter how simple and efficient a model might be.&lt;/p&gt;
&lt;p&gt;
That said, with TTS, it wasn’t as simple as just calling the APIs and getting the perfect resulting audio back. Some pre and post processing were needed, but that’s a topic for another time.&lt;/p&gt;
&lt;h3&gt;
More Machine Learning&lt;/h3&gt;
&lt;p&gt;
The cherry on top - now that Google’s APIs were integrated into the app, I ended up also using Google’s &lt;a href=&quot;https://ai.google/discover/palm2/&quot;&gt;PaLM 2&lt;/a&gt; to do text summarisation (it was initially done locally too) as well as for a ChatGPT-like AI prompt service, to power Persumi’s AI writing assistance feature.&lt;/p&gt;
&lt;h2&gt;
The Closing&lt;/h2&gt;
&lt;p&gt;
If you read this far, thank you! I hope you enjoyed reading (or listening) to this post. Please look around and kick tyres, I would love your feedback on how to improve Persumi.&lt;/p&gt;
&lt;p&gt;
Sign up for an account if you haven’t already, and leave a comment if you have any questions. Until next time!&lt;/p&gt;
]]&gt;</description>
      <link>http://persumi.com/c/persumi/u/fredwu/p/how-i-built-a-mostly-feature-complete-mvp-in-3-months-whilst-working-full-time</link>
      <title>How I Built a Mostly Feature-Complete MVP in 3 Months Whilst Working Full-Time</title>
    </item>
  </channel>
</rss>