database - Lu's blog

Caching Partially Materialized Views Consistently

By Lu Pan in cache on 05 Apr 2023

Cache and Materialized View According to the PostgreSQL wiki, A materialized view is a table that actually contains rows, but behaves like a view. That is, the data in the table changes when the data in the underlying tables changes. According to Wikipedia, ... a materialized view is a database object…

Project Management is a Concurrency Control Problem

By Lu Pan in database on 30 Jan 2023

The target audience of this post is the database community, and especially those who take an interest in the OLTP workload. When multiple people or multiple teams are working on the same service, the same code base, even the same code path, we need project management to keep track of…

Notes on the Spanner: becoming a SQL system paper

By Lu Pan in database on 10 Jan 2023

I have wanted to read the paper, Spanner: becoming a SQL system from SIGMOD 2017 for a while and finally got a chance to finish reading it. I have mixed feelings. The paper mostly focuses on the DQL aspect of SQL and only briefly mentioned DML and locking. I like…

IP as distributed data in the cloud

By Lu Pan in ip on 25 Nov 2022

I previously wrote a post on reasoning DNS as a distributed database. In the same spirit, today, let's take a look at IP as distributed data (the Internet would be a distributed database in this analogy). This is inspired by a very interesting blog post from Cloudflare, https://blog.cloudflare.…

Notes on Amazon's DynamoDB USENIX ATC'22 Paper

By Lu Pan in database on 13 Aug 2022

This is a very practical paper. It focuses on practical matters such as admission control, non-uniform access patterns, metastability introduced by caches, etc. You won't find fancy distributed system algorithms in this paper. But it's an important paper which covers critical topics nonetheless. A system only delivers real value to…

Fast SQL from Schemaless ingestion

By Lu Pan in rockset on 01 Jul 2022

This is a note on Louis Brandy and Nathan Bronson's talk at Systems @scale - https://www.facebook.com/atscaleevents/videos/5279535778820968/. Data generated from the customers are often schemaless (e.g. JSON), but fast SQL for analytical queries are desired out of this unstructured data. Having fast query performance…

Notes on Photon - Databricks' query engine over data lakes

By Lu Pan in olap on 01 Jul 2022

This is a note on Databricks' SIGMOD '22 paper - Photon: A Fast Query Engine for Lakehouse Systems, which won the best industrial paper award. At a high level, the paper describes Photon, a vectorized query engine (written in C++) that adapts to the underlying unstructured data at run-time to…

How and why the Relational Model works for databases

By Lu Pan in database on 16 Jan 2022

This is a note on, the Turing Award laureate, Ted Codd's revolutionary paper — A Relational Model of Data for Large Shared Data Banks [https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjv9YXhpbf1AhW_kIkEHazED9cQFnoECAYQAQ&url=https%3A%2F%2Fwww.…