Why did Google Stop Using MapReduce and Start Encouraging Cloud Dataflow?
First of all, it is
important to know that the old and the conventional MapReduce model is still
being used for certain batch computing activities. Nevertheless, there are some
specific tasks that cannot be carried out with the help of MapReduce and it is only
because of this reason that Google has recently stopped using MapReduce and has
started with the use of Cloud Dataflow.
For example, Google
web search index updating activity is dealing with a huge amount of data but it
requires incremental updated on a constant basis. As per reports, Google has
come up with an incremental computing mechanism known as Percolator for
carrying out this huge data activity.
It is important to
note that Google has come up with a streaming computing mechanism known as Milwheel
for carrying out low-latency computing jobs. Different services such as Google
Map views now depend on Milwheel. One thing that is worth noting is that the
applications of Google are not created from, no-where.
The applications rely
a lot on the distributed mechanisms for providing basic functionalities. Google
possesses distributed storage systems such as Google File System along with
successors like Collossus, BigTable and Chubby used by Gmail. Mesa is a data
warehousing mechanism that is geo-replicated and is used in Google ads world.
Therefore, it can
clearly be said that Google which is one of the leading internet companies
surviving the World Wide Web is not only about making the effective use of
MapReduce but it is also encouraging the use of Cloud Dataflow. More papers
published on this subject would be able to provide more details on Google’s
encouragement of the use of Cloud Dataflow.
Google Cloud Dataflow
service stands out in competition with the streaming data processing service of
Amazon called Kinesis and the other huge data products such as Hadoop. This is
due to the fact that Cloud Dataflow is built using a technology that Google
claims to be replacing all the algorithms behind the use of Hadoop.
A closer look on this
mechanism will give you an idea that Cloud Dataflow is actually a better
thought of tool. This is because the Google users can use it for enriching the
applications that they develop and even for the data that they deposit along
with analytics elements. Therefore, it can rightly be said that Google’s Cloud
Dataflow is a MapReduce killer. It can significantly result in the complete
replacement of MapReduce and various other huge data processing mechanisms.
Get enrolled in apache spark and scala training in Bangalore at NPN Training leaded by qualified industry leaders.
Comments
Post a Comment