## Authors

### “The Case for Learned Index Structures”

Paper code
arXiv:1712.0120l8v3
Date
<2018-04-30 Mon>
URL
https://www.arxiv-vanity.com/papers/1712.01208/

#### Jeff

• 2017, “Outrageously Large Neural Networks”.

## Preliminaries

### Background Knowledge

Deep learning models
are function approximators.

#### Search engine vs Database

• Relational Databases use a B-Tree index.
• Search engines mostly use inverted index.q
• Relational Databases give you what you asked for.
• Search engines give you what you wanted.

#### Terminology

Indices
= indexes. Indexes just sounds wrong to me.
Model
The set of functions that describe the relations between variables.

“Probabilistic and information theoretic methods are used to make results better anyway. Compromises are made anyway. Query reformulation, drift, etc. So it is just a natural progression to use NNs for some of these components? Am I right.” – A quote from myself.

### More Background Knowledge

#### The research works under the premise that

• Indices are models (set of functions). For example,
• B-Tree-Index: $$f: key \mapsto pos$$
• $$pos$$ is the position of a record, within a sorted array
• Hash-Index: $$f: key \mapsto pos$$
• $$pos$$ is the position of a record, within an unsorted array
• BitMap-Index: indicates if a data record exists or not

#### A new term is introduced!

Learned Index
A deep-learning model with the function of an index structure. Auto-magestically synthesised.

## Overview

### The Argument of the Paper

#### The researchers hypothesise…

that All existing index structures can be replaced with learned indices.

• Paper does not argue that you should necessarily.

It’s a novel approach to build indexes, complimenting existing work.

• Specifically, a model can

1. Learn the sort order/structure of keys,
2. and use this to predict the position/existence of records.

#### They explore…

• The extent to which learned models (including NNs) can replace traditional index for efficient data access.

#### They speculate…

• This could fundamentally change the way database systems are developed in the future.

### Investigations / Case studies

The studies performed in the paper are:

• About evaluating learned models on efficient data access, the role of traditional indices.
• Done on CPUs rather than G/TPUs for a fairer comparison with existing methods, despite new hardware being the biggest reason to use learned indices.

#### Theme 1: Can learned models speed up indices?

tested for write-heavy workloads (Briefly covered)

#### Theme 2: Can replacing individual components speed up indices?

Study 1 / 3 B-Tree (Evaluated)
Study 2 / 3 Hash-index (Evaluated)
Study 3 / 3 Bloom-filter (Evaluated)
other components (sorting, joins) (Briefly covered)

### Debunking the Myths

#### Myths or soon to become myths

1. Machine learning cannot provide the same semantic guarantees.

• B-Trees predict record position.
• Bloom filter is a binary classifier (like our Delta Rule network). It’s a space-efficient probabilistic data structure. See: BitFunnel.
1. NNs thought of as being very expensive to evaluate.
• Huge benefits, especially on the next generation of hardware.

#### Trends

• GPUs and TPUs in phones

The main reason to adopt learned indices (page 4).

• Scaling NN trivial. Cost = 0.

#### Benefits for databases

• Remove the branch-heavy index structures and add Neural Networks

### Results and Conclusions sneak peak

#### Results

1. Learned indices can be 70% faster than cache-optimized B-Trees while saving an order-of-magnitude in memory.
• Tested over several real-world datasets.

#### Conclusions

1. Replacing components of a data management system with learned models has far-reaching implications.
• This work only provides a glimpse of what might be possible…

## Introduction

#### Some examples

Covered in this paper by 3 separate studies:

1. B-Trees
• Great for range requests (retrieve all in a..b)
2. Hash-Maps
• key-based lookups
3. Bloom-filters
• Set membership
• May give false positives, but no false negatives

#### Solidly built

• Highly Optimised
• Memory
• Cache
• CPU
• Assume worst case

#### It works because…

• Knowing the exact data distribution enables optimisation of the index.

…But then we… must know. But we don’t always.

### Benefits of replacing B-Trees with Learned Indices

#### Benefits of replacing B-Trees with Learned Indices

1. B-Tree lookup $$O(\log_n) \Longrightarrow O(n)$$ (if SLM)
• Simple Linear [Regression] Model: predictor, 1 mul, 1 add…
1. ML accelerators (GPU/TPU) If the entire learned index can fit into GPU’s memory, that’s 1M NN ops every 30 cycles with current technology.
2. Mixture of Models (builds upon Jeff’s paper from last year) ReLU at top, learning a wide range of complex data distributions. SLRM at the bottom because they are inexpensive. Or use B-Trees at the bottom stage if the data is hard to learn.

## Case Studies

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

Replacing a B-tree with a Learned [Range] Index [Model].

#### Theory

• $$\therefore$$ B-Tree $$\approx$$ Regression Tree $$\approx$$ CDF $$\equiv$$ Learned Range Index.

#### Plan

• Experiment with a Naïve Learned Index … to see how bad it is.
• Experiment with a much better Learned Index, the RM-Index.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

Why can we replace B-Trees with DL again?

#### B-Tree is-a model

B-Tree-Index
$$f: key \mapsto pos$$
• $$pos$$ is the position of a record, within a sorted array

#### B-Tree $$\approx$$ Regression Tree

Regression Tree
A decision tree with $$\mathbb{R}$$ targets.
• Maps a key to a position with a min and max error.

#### Range Index Model is-a Cumulative Density Function (CDF)

A model which predicts the position given a key inside a sorted array effectively approximates a CDF (page 5).

• $$\therefore$$ B-Tree $$\approx$$ Regression Tree $$\approx$$ CDF $$\equiv$$ Learned Range Index.

### Study 1 of 3: B-Tree $$\Rightarrow$$ RT/RIM $$\Rightarrow$$ CDF $$\Rightarrow$$ Learned R.I.

#### Analogs

• Rebalanced vs Retrained

$$\therefore$$ min/max error guarantee only needed for training.

#### Cumulative Density Function (CDF)

$$F_X(x) = P(X \leq x)$$

A range index needs to be able to provide:

• point queries $$\checkmark$$
• range queries, sort order(records) $$\equiv$$ sort order(sorted look-up keys)) $$\checkmark$$
• guarantees on min-/max error.

CDF is good to go. It can be used as our Learned Range Index.

#### $$\therefore$$

Can replace index with other models including DL, so long as min and max error are similar to b-tree.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

#### Experiment 1.1 - Naïve Learned Index with TensorFlow

Objective
Evaluate to study the technical requirements to replace B-Trees.
Architecture
• Two-layer fully conneted neural network (32:32).
• 32 neurons/units per layer.
• ReLU activation function.
• The timestamps of messages from web server logs
• The positions of the messages (actual line number?)
• Is not simply error minimisation. Min-/max error
Purpose
Build secondary index over timestamps. Test performance.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

#### Critique

This is a very naïve learned index, and that’s how we want it. The researchers want to see how much faster a B-Tree is than a naïve neural network substitution. The answer is 300x faster.

ReLU activation function
$$f(x) = max(0, x)$$

The ReLU activation function is the new sigmoid in that it’s now the go-to activation function for deep learning.

It’s typically used for hidden layers as it avoids vanishing gradient problem, yet we don’t have a hidden layer. It’s just a line. It’s so basic, it’s perfect.

Also, the researchers are after a sparse representation, matching one key to one position, so this property of the ReLU makes it an even better candidate.

I assume that 32 neurons are used because that is the max string length of the timestamp / record position.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

#### Experiment 1.1 - Results

The researchers came to these findings:

• B-Trees are 2 orders of magnitude faster. Tensorflow is designed for larger models. Lots of overhead with Python.
• A single neural network requires significantly more space and CPU time for the last mile of error minimisation.
• B-Trees, or decision trees in general, are really good in overfitting the data (adding new data after balancing) with a few operations. They just divide up the space cheaply, using an if-statement.
• Other models can be significantly more efficient to approximate the general shape of a CDF.
• So models like NNs might be more CPU and space efficient to narrow down the position for an item from the entire data set to a region of thousands.
• But usually requires significantly more space and CPU time for the last mile.

These ideas are taken into account when designing the next model, the RM-Index.

### Study 1 of 3: Learned Range Index [Model] $$\approx$$ B-Tree

#### Challenges to replacing B-Trees

1. Main challenge: balance model complexity with accuracy.
1. Bounded cost for inserts and lookups, taking advantage of the cache.
2. Map keys to pages (memory or disk?)
3. Last mile accuracy. This is the main reason why the Naïve Learned Model was so slow. Overcome by using the Recursive Model (RM) Index.
• New terms

• Last mile accuracy

### Study 1 of 3: Learned Range Index [Model] $$\approx$$ B-Tree

#### Recursive Model (RM) Index

Also known as the Recursive Regression Model.

One of the key contributions of this research paper.

A hierarchy of models.

At each stage the model takes the key as an input and based on it picks another model, until the final stage predicts the position.

Each prediction as you go down the hierarchy is picking an expert that has better knowledge about certain keys.

Solves the ‘Last mile accuracy’ problem.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

#### Experiment 1.2 - Hybrid Recursive Model Index

Method
• n stages, n models per stage = hyperparameters
• Each net
• 0 to 2 fully conneted hidden-layers
• Up to 32 neurons/units per layer
• ReLU activation functions
• B-Trees.
• The timestamps of messages from web server logs
• The positions of the messages (actual line number?)
• Blogs, Maps, web documents, lognormal (synthetic)
• Is not simply error minimisation.
• After training, the index is optimised by replacing NN models with B-Trees if absolute min-/max- error is above a predefined threshold value.
Conclusions
• Allow use to bound the worst case performance of learned indexes to the performance of B-Trees.

### Study 1 of 3: B-Tree $$\Rightarrow$$ Learned Range Index [Model]

#### Results of Experiment 1.2

Was the data used obtained ethically? Who knows.

## Testing

• They developed what they call the ‘Leaning Index Framework’, an index synthesis system. It accelerates the process of index synthesis and testing.

## Aim of review

### Questions

1. What is the specific problem or topic that this research addresses?

1. Optimisation of an index requires knowledge of the data distribution. There is no guarantee of this. But it can be learned.
2. Learned indices provide new ways to further optimise search engines.
2. If the paper presents a new network, algorithm, or technique, how does it work? Is it suited to the task?

• A new model architecture, the Recursive Regression Model

Task: A substitute for a B-Tree.

Inspired by work done in the paper “Outrageously Large Neural Networks”.

Constitution: Build a hierarchy of models. At each stage the model takes the key as an input and based on it picks another model, until the final stage predicts the position.

Each model makes a prediction with a certain error about the position for the key and that the prediction is used to select the next model.

Recursive Model Indices are not trees.

The architecture divides the space into smaller sub-ranges like a B-tree/decision tree to make it easier to achieve to required last-mile accuracy with a fewer number of operations.

• Is it suited to the task? The model divides the space into smaller sub-ranges like a B-Tree to make it easier to achieve the required “last mile” accuracy with fewer operations. This solves one of the aformentioned complications of replacing a B-Tree.

The entire index can be represented as a sparse matrix-multiplication for a TPU/GPU.

Has it been well tested, and does it really work as claimed? What are the limitations?

1. This could change the way database systems are developed.
1. What are Innovations

2. Learned indices can be 70% faster than cache-optimized B-Trees while saving an order-of-magnitude in memory.

• Tested over several real-world datasets.
3. Did they choose the architecture - why or why not?

Is it clearly described (all parameters, settings etc.)? What strengths and/or weaknesses of the NN approach does it illustrate?

• Is the paper well structured and well written?

## Q&A

### Evaluation

#### Was the paper well organised?

It is well structured and well written.

#### Problem and solution

problem
Real world data does not perfectly follow known patterns. Specialised solutions expensive.
solution
ML. Learn the model -> Synthesise specialised index. Low cost.

#### Strengths and/or weaknesses of the NN approach

The paper illustrated that…

#### Did they choose the right architectures? Why or why not?

Is it clearly described (all parameters, settings etc.)?

### Own Questions

#### Research question defined?

What is the research question?

#### Generalization

Does the study allow generalization?

#### Consistency

The discussion and conclusions should be consistent with the study’s results.

Results in accordance with the researcher’s expectations not in accordance.

Do the authors of the article you hold in hand do the same?