A summary of the Google published whitepaper: F1 - A Distributed SQL Database That Scales.

  • F1 is a distributed database built on top of Spanner from Google. F1 was devised when MySQL was unable to service Google AdWords traffic. MySQL offers many functionalities that must still be supported by any replacements such as traditional SQL analysis, indexing, optimistic and pessimistic transactions, etc. Some of the required features are already supported by Spanner, but some other features such as optimistic transactions, automatic change history generation, transactionally consistent secondary indexes was not supported by Spanner. F1 builds on Spanner to provide those functionalities.
  • F1 offers consistency, scalability, high availability, and SQL-like syntax for traditional query analysis. F1 design choices offer many of the features (automatic change history generation, automatic indexing, consistency, etc) required in AdWords affected reads/writes latency.
  • Global consistency is backed by Spanner’s strongly consistent model.
  • F1 table are comprised of columns, which is a protobuf. Each row is an instance of data. This allows reads to be very granular and only get what is needed: some rows of some columns. For large tables, loading all columns by default means most data is not needed with read latency impact.
  • F1 supports row-level granularity locks by default, and has column-level lock granularity.
  • Read latency impact is mitigated by forcing the tables to be hierarchical so that data is clustered together explicitly. The implication is that this makes F1 less suitable for patterns that involves many tables that require joins but are not hierarchically related. Heavy batching, parallelism and asynchronous reads also helps. The F1 client library also systematically prevents common ORM library issues including serial reads, implicit traversals (load everything “just in case”).
    • For instance, this is an example from the paper and some hypothetical tables:
      • Customer
        • Campaign
          • CampaignData
          • CampaignOtherData
        • Payment
      • MyOtherData
      • SomeOtherData
        • DetailedInfo
    • Then, these queries should be fast: joining CampaignData with CampaignOtherData. Payment with Campaign. Payment and CampaignData (should still be fast).
    • However, these queries should be slow: MyOtherData with Payment. DetailedInfo with CampaignData. DetailedInfo with MyOtherData.
  • There are 3 transaction types:
    • Snapshot: read-only transactions at a fixed Spanner snapshot timestamp ST. This means it only sees data up to ST, but will remain consistent. The default snapshot read timestamp is the F1 global safe timestamp which is 5-10 seconds old. F1 global safe timestamp is the timestamp where reads in any F1 cluster is consistent.
    • Pessmistic: pessimistic transaction as we know it. Directly mapped to Spanner transactions. Needs to maintain states to hold locks and handled by single F1 server. Bottleneck. This has high latency but can be useful if conflict writes are frequent.
    • Optimistic: optimistic transaction as we know it. No Spanner locks taken. To detect row level conflicts, each row has an associated last modification timestamp when returned to the client. When the write is passed to F1, F1 compares the last modification timestamps of these rows to the ones re-fetched from Spanner, if the last modification timestamps do not agree, return conflict.
  • Read latency impact can be improved by co-locating F1 and Spanner instances. However, F1 servers and F1 slaves can talk to Spanner servers in other data centers (presumably when under load to load balance).
  • A read flow in code is:
    • Code uses the F1 client to issue a query to read data.
    • Load Balancer assigns query an F1 cluster.
    • Query hits F1 server. In distributed execution (see below), it is also the query coordinator.
    • F1 server decides whether centralized execution or distributed execution favours query processing latency. Distributed execution would mean queries will be distributed to F1 slaves, which handles joins and read requests.
    • If centralized execution:
      • F1 query is forwarded to Spanner, presumably converted to something that Spanner understands.
      • The “Spanner Query” gets served and Spanner returns a data stream back to F1 server.
      • F1 server streams the returned Spanner data stream to F1 client.
    • If distributed execution:
      • F1 query is repartitioned and forwarded to N slaves, which handle joins and aggregations.
      • Each F1 slave will issue their part of the F1 query to Spanner.
      • The “Spanner Query” gets served and Spanner returns a data stream back to F1 slaves.
      • The query coordinator aggregates the streamed results from F1 slaves. The implication is that ordering properties of input data are lost. The query coordinator can be a bottleneck and F1 queries can be distributed to be consumed by more than one consumer in parallel.
  • For writes, latency is about 50-150ms due to using Paxos algorithm to finally determine between different F1 replicas, if a write is allowed to commit. Web application read latency averages 200ms after using F1 largely due to the use of explicit clustering, heavy data streaming, asynchronous reads and systematic prevention anti-ORM patterns (e.g. serial reads, implicit traversal).

 

Meta points:

  • Know about more than just relational database, and how databases are designed. Most of us just use databases.
  • An Google-ish view on how distributed system works, and ideas on how to work on latency/bottlenecks.
  • Perspectives on how to deal with anti-ORM patterns that works in practice.

 

Why I wrote this? On this Thanksgiving weekend, I was sick and in this nightmarish dream I was asked to explain F1 to my audience in some university lecture. So here I am, my I-don’t-know-who audience and what you asked, hopefully this answers your questions. Time to take meds and rest.