Caching Partially Materialized Views Consistently

Cache and Materialized View According to the PostgreSQL wiki, A materialized view is a table that actually contains rows, but behaves like a view. That is, the data in the table changes when the data in the underlying tables changes. According to Wikipedia, ... a materialized view is a database object…

Project Management is a Concurrency Control Problem

The target audience of this post is the database community, and especially those who take an interest in the OLTP workload. When multiple people or multiple teams are working on the same service, the same code base, even the same code path, we need project management to keep track of…

Notes on the Spanner: becoming a SQL system paper

I have wanted to read the paper, Spanner: becoming a SQL system from SIGMOD 2017 for a while and finally got a chance to finish reading it. I have mixed feelings. The paper mostly focuses on the DQL aspect of SQL and only briefly mentioned DML and locking. I like…

IP as distributed data in the cloud

I previously wrote a post on reasoning DNS as a distributed database. In the same spirit, today, let's take a look at IP as distributed data (the Internet would be a distributed database in this analogy). This is inspired by a very interesting blog post from Cloudflare, https://blog.cloudflare.…

Notes on Amazon's DynamoDB USENIX ATC'22 Paper

This is a very practical paper. It focuses on practical matters such as admission control, non-uniform access patterns, metastability introduced by caches, etc. You won't find fancy distributed system algorithms in this paper. But it's an important paper which covers critical topics nonetheless. A system only delivers real value to…

Fast SQL from Schemaless ingestion

This is a note on Louis Brandy and Nathan Bronson's talk at Systems @scale - https://www.facebook.com/atscaleevents/videos/5279535778820968/. Data generated from the customers are often schemaless (e.g. JSON), but fast SQL for analytical queries are desired out of this unstructured data. Having fast query performance…

Notes on Photon - Databricks' query engine over data lakes

This is a note on Databricks' SIGMOD '22 paper - Photon: A Fast Query Engine for Lakehouse Systems, which won the best industrial paper award. At a high level, the paper describes Photon, a vectorized query engine (written in C++) that adapts to the underlying unstructured data at run-time to…

How and why the Relational Model works for databases

This is a note on, the Turing Award laureate, Ted Codd's revolutionary paper — A Relational Model of Data for Large Shared Data Banks [https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjv9YXhpbf1AhW_kIkEHazED9cQFnoECAYQAQ&url=https%3A%2F%2Fwww.…