Introduction to Apache Spark, RDDs (Using PySpark)

Introduction Industry estimates that we are creating more than 2.5 Quintillion bytes of data every year. . Let’s give this a thought – 1 Quintillion = 1 Million Billion! Hard to even imagine how many drives / CDs / Blue-ray DVDs would be required to store them. It is difficult to imagine this scale of data … Continue reading Introduction to Apache Spark, RDDs (Using PySpark)