Normal vs Vectorized Instructions

Vectorized Queries

Vectorized execution operates on batches of rows at a time instead of individual rows.

It allows queries to be sped up by reducing the number of method calls, improving cache efficiency, and potentially enabling CPU SIMD instructions.

Scope

Supported Vectorized Query Types

As of Druid 0.20.2, the following query types can be vectorized:

  1. GroupBy
  2. Timeseries

Vectorization has a few important requirements:

  • Query-level filters must either run on bitmap indexes or offer vectorized row matchers.
  • Supported filters include selector, bound, in, like, regex, search, and, or, and not.
  • Filters in filtered aggregators must offer vectorized row matchers.
  • Aggregators must offer vectorized implementations, including count, numeric sum/min/max, any-value aggregators, hyperUnique, filtered, approxHistogram, approxHistogramFold, and fixedBucketsHistogram.
  • Virtual columns must offer vectorized implementations.
  • GroupBy dimension specs must be default specs, without extraction functions, filtered dimension specs, or multi-value dimensions.
  • Timeseries queries cannot use descending order.
  • Only immutable segments and table datasources are supported.

Unsupported Query Types

As of Druid 0.20.2, these query types cannot be vectorized:

  1. TopN
  2. Scan
  3. Select
  4. Search

Vectorization Parameters

PropertyDefaultDescription
vectorizetrueEnables or disables vectorized query execution.
vectorSize512Sets the row batching size for a query and overrides druid.query.default.context.vectorSize when set.
vectorizeVirtualColumnsfalseControls vectorized processing for virtual columns. Values are false, true, and force.

How To Enable Vectorization

Set vectorization defaults in common config:

druid.query.default.context.vectorize=true
druid.query.default.context.vectorSize=1024
druid.query.default.context.vectorizeVirtualColumns=true

Or pass vectorization parameters in query context:

{
  "context": {
    "vectorize": true,
    "vectorSize": 512,
    "vectorizeVirtualColumns": false
  }
}

References

  1. Vectorization parameters
  2. Query vectorization proposal
  3. Druid 0.20.0 release notes
  4. Vectorization benchmarks