Real Reasons behind Apache Kafka’s Popular Use


A lot of web-scale companies and enterprises have several problems that Kafka can fit only a class of. Kafka can serve as the source of truth by assembling and keeping all of the "facts" or "events" for a system, so for building a set of resilient data services and applications it can be considered. Apache Kafka is a strong and increasingly popular asynchronous messaging technology.

Kafka can be described as a scalable, fault-tolerant, publish-subscribe messaging system that allows you to design distributed applications and powers web-scale Internet companies, like for example LinkedIn, Twitter, AirBnB, and many others. The best thing is Kafka has made a remarkable positive impact on generally slower-to-adopt, traditional enterprises also, besides satisfying interests of Internet Unicorns only.

Kafka was developed around 2010 at LinkedIn by a team including Jay Kreps, Jun Rao, and Neha Narkhede.It was developed to deal with issues like

  • Low-latency ingestion of voluminous event data from the LinkedIn website and infrastructure into a lambda architecture that harnessed Hadoop and real-time event processing systems.
  • Ingesting data into offline batch systems without exposing implementation details to downstream users. To avoid using a push model that could at ease overwhelm a consumer.
  • Getting the data from source systems and appropriately moving it around for enabling one build fancy machine-learning algorithms.

Kafka looks and feels like publish-subscribe system that is capable to deliver in-order and ensure persistent, scalable messaging. It possesses publishers, topics, and subscribers. It can also segment topics and enable huge parallel consumption. All messages written to Kafka are persisted and replicated to peer brokers for fault tolerance, and those messages stay around for a configurable period of time (i.e., 7 days, 30 days, etc.).

The key to Kafka is the log data structure, a time-ordered, append-only sequence of data inserts where the data just an array of bytes. In Kafka, messages are written to a topic that maintains this log (or multiple logs — one for each partition) from which subscribers can read and derive their own representations of the data.
  • Kafka does not have individual message IDs.
  • Kafka being not a traditional message broker can be very fast.
  • Kafka does not keep a track of the consumers that a topic has or who has consumed what messages, so it lightens the load. Kafka, therefore, can make optimizations.
  • Kafka keeps all log parts for a specified time-frame as there are no deletes.
  • It can efficiently stream the messages to consumers using kernel-level IO and does not buffer the messages in user space.
  • It can leverage the operating system for file page caches and efficient write back/write through to disk.

Because of the above-mentioned features, Kafka is popularly used in the big data space as a trustworthy way to ingest and move massive data very quickly. Netflix uses Kafka as its primary backbone for ingestion via Java APIs or REST APIs. Kafka supports micro services and it can offer benefit from having multiple publish/subscribe and an array of tools.

Get enrolled in apache spark and scala training in Bangalore at NPN Training leaded by qualified industry leaders.

Comments

Popular posts from this blog

Why did Google Stop Using MapReduce and Start Encouraging Cloud Dataflow?

How Can SDET Training Progress Your Career?

Benefits of Spark and Scala Training