Fast SQL from Schemaless ingestion

This is a note on Louis Brandy and Nathan Bronson's talk at Systems @scale - https://www.facebook.com/atscaleevents/videos/5279535778820968/. Data generated from the customers are often schemaless (e.g. JSON), but fast SQL for analytical queries are desired out of this unstructured data. Having fast query performance…

Notes on Photon - Databricks' query engine over data lakes

This is a note on Databricks' SIGMOD '22 paper - Photon: A Fast Query Engine for Lakehouse Systems, which won the best industrial paper award. At a high level, the paper describes Photon, a vectorized query engine (written in C++) that adapts to the underlying unstructured data at run-time to…

When and How to Invalidate Cache

You can leave comments on HN – https://news.ycombinator.com/item?id=31709501. In this post, I will talk about one way to figure out when to invalidate cache entries. I will use a specific setup as an example, which should still be general enough, and at the end of…

Cache Invalidation

Cache made consistent: Meta’s cache invalidation solutionWhen it comes to cache invalidation, we believe we now have an effective solution to bridge the gap between theory and practice.Engineering at MetaLu Pan [https://engineering.fb.com/2022/06/08/core-data/cache-invalidation/]We wrote a post on Meta/FB's engineering…

Feedback Loops

Any complex systems e.g. an economy seem to have feedback loops. Digital systems are no exceptions. Feedback loops add significant amount of complexity to a system, which makes it harder to manage. However, a philosophical point might be that it's exactly the presence of feedback loops that makes a…

Little's Law and Service Latency

Queueing Theory is very relevant to Computer Science as queues are all over the places – instructions to CPUs, packets to NICs or switches, requests to servers, etc. Little's Law (L=λW) is a really cool theorem because it requires very few conditions which implies its wide application. It works on…

Don't point out something wrong immediately

This is not a post on a technical topic, but rather on a common pitfall for engineers – we like to point out something wrong the moment we see it. Understandably, we are very good at it. We are basically trained for it. When it comes to code reviews, or design…

How and why the Relational Model works for databases

This is a note on, the Turing Award laureate, Ted Codd's revolutionary paper — A Relational Model of Data for Large Shared Data Banks [https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjv9YXhpbf1AhW_kIkEHazED9cQFnoECAYQAQ&url=https%3A%2F%2Fwww.…