This is a prerelease version.

View latest

Create data pipelines with the Jet engine

The Jet engine is a batch and stream processing system that allows Hazelcast members to do both stateless and stateful computations over large amounts of data with consistent low latency.

The Jet engine processes data in pipelines. Pipelines allow you to take data stored in one location (source), apply processing, and send it to another location (sink). If you only want to process data without moving it, you can use the same location for source and sink.

Pipelines are executed as jobs. Jobs are run in parallel across all Hazelcast cluster members for maximum speed.

For more information, see Data pipelines.

Configure the Jet engine

The Jet engine is enabled by default and has the following configuration options. For more detailed information, see the API reference.

Option Default Description

enabled

true

Set to false to disable the Jet engine.

resourceUploadEnabled

false

Set to true to enable the uploading of resources for Jet jobs.

instance/cooperativeThreadCount

The number of threads Jet creates in its cooperative multithreading pool.

instance/flowControlPeriodMS

100

The duration of the interval between flow control packets.

instance/backupCount

1

The number of synchronous backups to configure on the IMap, which Jet needs internally to store job metadata and snapshots.

instance/scaleUpDelayMS

10000

The delay after which the auto-scaled jobs restart if a new member joins the cluster.

instance/losslessRestartEnabled

false

Whether the lossless cluster restart feature is enabled.

Lossless Restart requires Persistence to be enabled. See Persistence, backup, and restore.

instance/maxProcessorAccumulatedRecords

The maximum number of records that can be accumulated by any single processor instance.

edgeDefaults/queueSize

The capacity of processor-to-processor concurrent queues.

edgeDefaults/packetSizeLimit

The maximum packet size in bytes.

edgeDefaults/receiveWindowMultiplier

The scaling factor used by the adaptive receive window sizing function.

bucketConfig

JAR files from an external bucket are made accessible to cluster members when the following parameter values are supplied:

  • secretName: Name of the Secret object that holds the credentials for your cloud provider.

  • bucketURI: Full path for the external bucket. For example: gs://your-bucket/path/to/jars.

configMaps

List of names of ConfigMaps. Files in each ConfigMap will be downloaded.

remoteURLs

List of URLs from where files will be downloaded.

Example configuration

The following example creates a Hazelcast cluster and sets some Jet configuration.

Example configuration
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.6.0-slim'
  licenseKeySecretName: hazelcast-license-key
  jet:
    enabled: true
    resourceUploadEnabled: true
    instance:
      cooperativeThreadCount: 4
      flowControlPeriodMillis: 100
      backupCount: 1
      scaleUpDelayMillis: 10000
      losslessRestartEnabled: false
      maxProcessorAccumulatedRecords: 1000000000
    edgeDefaults:
      queueSize: 1024
      packetSizeLimit: 16384
      receiveWindowMultiplier: 3

Configuration properties

You can also set configuration using Java system properties. See Set arbitrary JVM arguments for more information.

Example configuration
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.6.0-slim'
  licenseKeySecretName: hazelcast-license-key
  jet:
    enabled: true
  jvm:
    args:
    - "-Dhazelcast.jet.idle.cooperative.min.microseconds=50"

Next steps

Once you have configured the Jet engine, you can submit Jet jobs.

For a worked example, see the Run a data pipeline using Jet tutorial.