The Gmail importer now updates incrementally, which means it checks for all messages whether they are already imported, and only downloads and processes new ones. This is especially important for a messages importer, as you want to update your email frequently, which is also why we will improve the efficiency of this component in the weeks to come. We also changed from batch import to single mail import. Although this is slightly slower, it now shows your latest email without a delay, and the progress bar fills up smoothly.
Face clustering ML
Furthermore, we have implemented a face clustering algorithm, similar to the one you see on the iPhone, for instance. This allows you to easily find images of your contacts. We know that this is a feature people value, but on the other hand, one that is very intrusive in terms of privacy. It's software meant to identify the people most important to you, and you don't want that information to be misused. Although not a trivial feature to build, we think it's worth the effort as there are no open-source privacy-friendly alternatives without having to do a bunch of programming yourself.
The pipeline consists of four parts:
1) Detecting faces and drawing boxes around them;
2) Cropping these boxes and normalizing them: scaling and rotating to be oriented upwards;
3) Extracting a vector - a numerical representation - for each face crop;
4) Clustering the faces based on the distance between the face vectors, where each cluster is a different person.
Here's one example of how it works:
We managed to stitch these parts together and tested them on a small dataset of our own images. The results look very promising and we expect that this model will be ready in the next release, two weeks from now. Together with the integrator, we will also release the documentation that will explain how the pipeline works in detail.
WhatsApp moved to pyIntegrators
Lastly, we decided to merge the rustIntegrators, home of the WhatsApp importer, into our pyIntegrators repository. By doing so we were able to get rid of many duplicated parts of our code base. One place and one programming language less, means less duplication of PodClient, schema definitions, documentation, etc. Although we might reintroduce rustIntegrators in the future, for now this will enable us to move faster, and have less moving parts to consider when making changes in our overall design.
Visibility and marketing tools
When we started our journey, you could only find us on our own GitLab and this blog. When we had our first code release, we set up memri.io, and two weeks ago we launched a consumer focused website memri.cloud. We need marketing tools to make sure we can reach our audience and effectively communicate our message. However, many of those tools are not as privacy minded as we would like. Luckily, after some research, we did manage to find solutions that meet our standards. On memri.cloud you can subscribe to our mailing list, which we manage with Mailchimp. It gives us control to not track user data, and to not share our mailing list data with Mailchimp itself. For website analytics we use Plausible, which gives us insight in our traffic while preserving our visitors’ privacy. If you're interested in how they protect your privacy, check out this in depth explanation.
As you may have read in our previous sprint update, we are redesigning how we define the database schema. We did not yet finish our design, and we plan to explain it in detail when we do. However, we can already share some decision we made to give you an idea of what we are aiming or:
- We want the schema to be diffable, which means you can easily see what changes a developer made when merging new features into the product. This will make it easy to prevent conflicts and validate decisions as the schema grows.
- Instead of a schema repository that defines the whole schema, we now have integrators define their own 'part' of the schema they need to function. This way, we have datatypes defined together with the code that needs them.
- We will create some sort of plugin repository to keep track of all (validated) integrators. This will allow us to prevent conflicts between integrators, and to easily see all schema definitions is one place.
On the financial side, we are almost closing our first investment round. With the bank account set up and most of the negotiated investments signed, we expect the process and thus the round to close within the next week or two. This will allow us to focus on building new features without distractions for a year or two, and to grow our team where we lack skills, which is definitely good news for the project! More details will follow when we finalize the round!