In the Hadoop Ecosystem, will Spark and SparkSQL Replace Pig and Hive?
Apache Hive has always served as one of the greatest solutions till date and it is also getting better day by day. The latency of Hive is simply awesome. There exist several ways of making Hive faster by using Impala or Tez. Similarly, there are even people who make the best use of Pig for getting raw data by way of HDFS. There is great ease and convenience in using Pig for getting raw data. However, it is said that with the excessively number of tools growing every day, the use of Pig is actually growing shorter. On the other hand, Apache Spark is phenomenal, however; it has its own limitations. The in memory of Apache Spark is not sufficient most of the times. The users have to actually process large amount of data on the disks like Hadoop. It has been seen that both Spark and Hive have been used in parallel. The users who have some knowledge about Lambda Architecture will agree with the fact that both Hive and Spark serve at extremely different levels. Hive is best...