News/Tech News

Apache Druid 25.0 Delivers Multi-Stage Query Engine and Kubernetes Task Management

Published on Jan 19, 2023

In its most recent release, version 25.0, Apache Druid provides a number of improvements and enhancements to its high-performance real-time datastore. The main new features are: the multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready; Kubernetes can be used for task launch and management eliminating the need for middle managers; simplified deployment; and a new dedicated binary for Hadoop 3.x users.

Druid’s design incorporates concepts from data warehouses, time-series databases, and search systems in order to produce real-time analytics and reduce time to insight.

The architecture is based on microservices and is cloud-ready, and it consists of several types of services, including: the Coordinator service, which maintains data availability on the cluster, the Overlord service, which assigns workloads for data ingestion, the Broker service, which deals with external client inquiries, and the MiddleManager service, which ingests data into the cluster.

As part of the ingestion phase, Druid reads the data from the source system and stores it in data files called segments. The average segment file contains a few million rows. Each segment file is partitioned by time and organized into a columnar structure that is stored separately in order to decrease query latency by scanning only those columns that are actually necessary for the query.

Druid supports both streaming and batch data ingestion. Typically, it connects to a source of raw data, usually a message bus such as Apache Kafka (for streaming data loads), a distributed file system such as HDFS (for batch data loads), or cloud-based storage such as Amazon S3 and Azure Blob Storage (for batch data loads), and is capable of converting raw data into a more read-optimized format (segment) through a process known as “indexing.” Apache Druid is capable of ingesting denormalized data in JSON, CSV, Parquet, Avro and other custom formats.

Druid SQL can be used to query Druid data sources. Druid translates SQL queries into its native query language.

The Druid application comes with a web console through which you can load data, manage data sources and tasks, as well as control the server status and segment information. Additionally, you are able to execute SQL queries and native Druid queries from the console.

Apache Druid is frequently used when real-time ingest, fast query performance, and high uptime are essential.

As a result, Druid is commonly used as a backend for APIs requiring quick aggregation or to power analytical apps. It is best to use Druid with event-oriented data.

Applications typically include clickstream analytics (web and mobile analytics), risk/fraud analysis, network telemetry analytics (network performance monitoring), application performance metrics, and business intelligence / OLAP.

Tech News

Google Kubernetes Engine Adds Multishares for Filestore Enterprise

Google Kubernetes Engine Adds Multishares for Filestore Enterprise

Google Cloud has made Filestore Enterprise Multishares for Google Kubernetes Engine (GKE)…

Generating Text Inputs for Mobile App Testing Using GPT-3

Geoffrey Hinton publishes new deep learning algorithm

Geoffrey Hinton, professor at the University of Toronto and engineer at Google Brain, recently…

Our Latest Blog

Mastering Full Stack Python Development with Django A Comprehensive Guide

Mastering Full Stack Python Development with Django: A Comprehensive Guide

Python is a powerful programming language that has taken the world of web development by...
Read More
Mastering Machine Learning A Beginner's Guide to Python

Mastering Machine Learning: A Beginner’s Guide to Python

Welcome to the world of machine learning! With the ever-increasing demand for artificial intelligence and...
Read More
Unlocking the Power of Data Science with Python A Beginner's Guide

Unlocking the Power of Data Science with Python: A Beginner’s Guide

Data science has become an essential part of many industries today, and Python has become...
Read More

Follow Us

Resources

Presentations
Browse LSET presentations to understand interesting…

Explore Now


eBooks
Get complete guides to empower yourself academically…

Explore Now


Infographics
Learn about information technology and business…

Explore Now