Kafka, NiFi, Schema Registry … all in Docker

Martin Hynar
4 min readNov 11, 2020

This is description of setup to get working lab environment with following components:

Apache Kafka/ Confluent Platform

Apache Kafka is a distributed streaming platform. You publish data on one end and consume on the other end. Kafka takes care of persistence, replication, retention, …

  • Apache Kafka Project — Vanilla open source project under the hood of Apache Organisation
  • Confluent Platform — Commercial product delivering Apache Kafka with extra tooling. Still provides community version.

Apache NiFi

Apache NiFi is platform for distributing and processing data.

Confluent Schema Registry

One of the components in Confluent Platform suite. Schema Registry provides interface for serving versioned data schemes (e.g. AVRO, JSON Schema)

Confluent Kafka REST

Component that allows to communicate with Kafka using REST API. Using Kafka REST might be good alternative for simple clients that does not need to implement full-fledged Kafka communication.

Kafka REST documentation

Apache ZooKeeper

ZooKeeper is hierarchical key-value store that is used by distributed systems for coordinating and synchronizing their operations.

  • Apache ZooKeeper Project — Vanilla open source project under the hood of Apache Organisation
  • Confluent Platform — Commercial product delivering Apache Kafka with extra tooling. Still provides community version.

Docker Compose File

The source for the Docker Compose is available in martinhynar/docker-kafka-nifi-schemaregistry. The GitHub version is the one to use, it may contain fixes or addons that are not reflected in this article.

Common network

All components deployed in this docker compose configuration need to talk to each other. For this purpose, use dedicated network definition.

networks:
docker-kafka-nifi-schemaregistry:
driver: bridge

ZooKeeper

The essential part in this Docker Compose configuration is ZooKeeper that is used by other services.

zookeeper:
image: confluentinc/cp-zookeeper:6.0.0
container_name: zookeeper
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
networks:
- docker-kafka-nifi-schemaregistry
  • Start zookeeper service
  • Listen on port 2181
  • Use image cp-zookeeper maintained by Confluent

Kafka

kafka:
image: confluentinc/cp-kafka:6.0.0
container_name: kafka
depends_on:
- zookeeper
networks:
- docker-kafka-nifi-schemaregistry
hostname: kafka
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
  • Start kafka service
  • Use image cp-kafka maintained by Confluent
  • Publish port 9092 outside of the docker environment
  • Service kafka is dependent on zookeeper service. Service kafka connects to zookeeper:2081.
  • Service kafka listens also on port 29092. This port is used by other components in this docker container configuration. It is not published outside.
  • The is no authentication of transport security enabled.

Schema Registry

schema-registry:
image: confluentinc/cp-schema-registry:6.0.0
container_name: schema-registry
depends_on:
- kafka
networks:
- docker-kafka-nifi-schemaregistry
hostname: schema-registry
ports:
- 8085:8085
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8085
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:29092
  • Start schema-registry service
  • Use image cp-schema-registry maintained by Confluent
  • Publish port 8085 outside of the docker environment. This port can be used to control maintained schemes from outside (e.g. in browser).
  • Service schema-registry is dependent on kafka service. Service schema-registry connects to kafka:29092. Schema Registry uses Kafka as persistent store for schemes it maintains.
  • The is no authentication of transport security enabled.

Kafka REST

rest-proxy:
image: confluentinc/cp-kafka-rest:6.0.0
container_name: rest-proxy
depends_on:
- kafka
- schema-registry
networks:
- docker-kafka-nifi-schemaregistry
hostname: rest-proxy
ports:
- 8082:8082
environment:
KAFKA_REST_HOST_NAME: rest-proxy
KAFKA_REST_LISTENERS: "http://0.0.0.0:8082"
KAFKA_REST_SCHEMA_REGISTRY_URL: schema-registry:8085
KAFKA_REST_BOOTSTRAP_SERVERS: kafka:29092
KAFKA_REST_SECURITY_PROTOCOL: "PLAINTEXT"
KAFKA_REST_CLIENT_SECURITY_PROTOCOL: "PLAINTEXT"
  • Start rest-proxy service
  • Use image cp-kafka-rest maintained by Confluent
  • Publish port 8082 outside of the docker environment. This port can be used to publish or consume messages from Kafka using HTTP.
  • Service rest-proxy is dependent on kafka service. Service rest-proxy connects to kafka:29092. Kafka REST provides HTTP Proxy when communicating with Kafka.
  • Service rest-proxy is dependent on schema-registry service. Service rest-proxy connects to schema-registry:8085. Kafka REST uses Schema Registry when working with messages that reference schema (e.g. AVRO) maintained by Schema Registry.
  • The is no authentication of transport security enabled.

NiFi

nifi:
image: apache/nifi:latest
container_name: nifi
depends_on:
- zookeeper
- kafka
- schema-registry
networks:
- docker-kafka-nifi-schemaregistry
hostname: nifi
ports:
- 8080:8080
environment:
- NIFI_WEB_HTTP_PORT=8080
- NIFI_CLUSTER_IS_NODE=true
- NIFI_CLUSTER_NODE_PROTOCOL_PORT=8082
- NIFI_ZK_CONNECT_STRING=zookeeper:2181
- NIFI_ELECTION_MAX_WAIT=1 min
  • Start nifi service
  • Use image apache/nifi maintained by Apache Software Foundation.
  • Publish port 8080 outside of the docker environment. This port can be used to access NiFi user interface.
  • Service nifi is dependent on zookeeper service. Service kafka connects to zookeeper:2081. This is direct dependency NiFi needs to start correctly
  • Service nifi is dependent on kafka and schema-registry services. These are indirect dependencies, only to make sure that these services are running to be usable by NiFi processors in user defined flows.
  • The is no authentication of transport security enabled.

Running Docker Compose

By running this Docker Compose configuration, the described services will be started. After all services are up and running, there will be several ports available to be used outside of the docker environment.

Kafka — port 9092

You can connect your clients running on host system to port 9092 and communicate with Kafka.

Kafka REST — port 8082

Instead of directly connecting to Kafka, use HTTP Proxy provided by Kafka REST. Consult full set of operations in Kafka REST API documentation.

To list existing topics: http://localhost:8082/topics

Schema Registry — port 8085

Manage schemes in Schema Registry using REST API operations. Complete list of operations is available in Schema Registry API documentation.

To list existing subjects: http://localhost:8085/subjects

NiFi — port 8080

Use this to access NiFi user interface and start with definition of your data flows.

Open NiFi user interface: http://localhost:8080/nifi

Summary

Docker Compose configuration described in this article will set up environment with components that will allow you to start play with definition of NiFi data flows while being able to consume data from Kafka topics and produce them back to Kafka.

The configuration is easily extensible with other services — e.g. ElasticSearch — to be able to load and store data from and to more data repositories.

In some other posts, I will show what practical use cases can be solved using this setup.

--

--