Big Data Journal

Posts

Showing posts from May, 2018

In the Hadoop Ecosystem, will Spark and SparkSQL Replace Pig and Hive?

May 24, 2018

Apache Hive has always served as one of the greatest solutions till date and it is also getting better day by day. The latency of Hive is simply awesome. There exist several ways of making Hive faster by using Impala or Tez. Similarly, there are even people who make the best use of Pig for getting raw data by way of HDFS. There is great ease and convenience in using Pig for getting raw data. However, it is said that with the excessively number of tools growing every day, the use of Pig is actually growing shorter. On the other hand, Apache Spark is phenomenal, however; it has its own limitations. The in memory of Apache Spark is not sufficient most of the times. The users have to actually process large amount of data on the disks like Hadoop. It has been seen that both Spark and Hive have been used in parallel. The users who have some knowledge about Lambda Architecture will agree with the fact that both Hive and Spark serve at extremely different levels. Hive is best in

Can Apache Flink Replace Apache Spark?

May 15, 2018

Apache Flink and Apache Spark are both distributed and open-sourced processing frameworks built for reducing the latencies of the Hadoop MapReduce in quick data processing. A very common misconception exists in this field and that is Apache Flink will soon be replacing Apache Spark. Is it really possible for both these huge data technologies to co-exist and serve the requirements of fast and fault-tolerant processing. Flink and Spark might seem quite similar to individuals who have not worked with any of these and are quite familiar with Hadoop. However, it is quite obvious that such individuals will probably feel that the progress of the Apache Flink is superfluous. Nevertheless, Flink has managed to remain ahead in the competition mainly because of the stream processing feature that it possesses. This feature helps it to manage and process large rows of data in real time. This is something that is not possible with Apache Spark which takes the batch processing

Technical and Non-Technical Skills to Become a Data Scientist?

May 01, 2018

With Big Data taking over every industry across the globe, Data Scientists has also started gaining a good attention of the world. Whether it is to increase the customer retention, mine data for enhanced business opportunities or to deal with the process of product crop an expert data scientist can show any business the actual path to success. This consistently increased demand for the Data Scientists has made it obvious for the employers to hire the best one. So here for those who are willing to be the Data Scientist expert, I am going to tell about the technical and non-technical skills required to become a Data Scientist: If you are a potential data scientist then you can go with these skills and make your data science career shining. Technical Skills required to become A successful Data Scientist: Knowledge of data mining, process the bulk of data, big data processing, statistical analysis are some of the vital technical skills required to become a professional data