Real Reasons behind Apache Kafka’s Popular Use
A lot of web-scale companies and enterprises have several
problems that Kafka can fit only a class of. Kafka can serve as the source of
truth by assembling and keeping all of the "facts" or
"events" for a system, so for building a set of resilient data
services and applications it can be considered. Apache Kafka is a strong and
increasingly popular asynchronous messaging technology.
Kafka can be described as a scalable, fault-tolerant,
publish-subscribe messaging system that allows you to design distributed
applications and powers web-scale Internet companies, like for example LinkedIn,
Twitter, AirBnB, and many others. The best thing is Kafka has made a remarkable
positive impact on generally slower-to-adopt, traditional enterprises also,
besides satisfying interests of Internet Unicorns only.
Kafka was developed around 2010 at LinkedIn by a team including
Jay Kreps, Jun Rao, and Neha Narkhede.It was developed to deal with issues like
- Low-latency ingestion of voluminous event data from the LinkedIn website and infrastructure into a lambda architecture that harnessed Hadoop and real-time event processing systems.
- Ingesting data into offline batch systems without exposing implementation details to downstream users. To avoid using a push model that could at ease overwhelm a consumer.
- Getting the data from source systems and appropriately moving it around for enabling one build fancy machine-learning algorithms.
Kafka looks and feels like publish-subscribe system that is
capable to deliver in-order and ensure persistent, scalable messaging. It possesses
publishers, topics, and subscribers. It can also segment topics and enable huge
parallel consumption. All messages written to Kafka are persisted and
replicated to peer brokers for fault tolerance, and those messages stay around
for a configurable period of time (i.e., 7 days, 30 days, etc.).
The key to Kafka is the log data structure, a time-ordered,
append-only sequence of data inserts where the data just an array of bytes. In
Kafka, messages are written to a topic that maintains this log (or multiple
logs — one for each partition) from which subscribers can read and derive their
own representations of the data.
- Kafka does not have individual message IDs.
- Kafka being not a traditional message broker can be very fast.
- Kafka does not keep a track of the consumers that a topic has or who has consumed what messages, so it lightens the load. Kafka, therefore, can make optimizations.
- Kafka keeps all log parts for a specified time-frame as there are no deletes.
- It can efficiently stream the messages to consumers using kernel-level IO and does not buffer the messages in user space.
- It can leverage the operating system for file page caches and efficient write back/write through to disk.
Because of the above-mentioned features, Kafka is popularly
used in the big data space as a trustworthy way to ingest and move massive data
very quickly. Netflix uses Kafka as its primary backbone for ingestion via Java
APIs or REST APIs. Kafka supports micro services and it can offer benefit from
having multiple publish/subscribe and an array of tools.
Get enrolled in apache
spark and scala training in Bangalore at NPN Training leaded by
qualified industry leaders.
Comments
Post a Comment