Category Archives: Spark – pyspark

PySpark – dev set up – Eclipse – Windows

For our example purposes, we will set-up Spark in the location: C:\Users\Public\Spark_Dev_set_up Note: I am running Eclipse Neon Prerequisites Python 3.5 JRE 8 JDK 1.8 Eclipse plugins: PyDev Steps to set up: Download from here: https://spark.apache.org/downloads.html 1. Choose a Spark release: 2.1.0 2. Choose a package type: Pre-built for Apache Hadoop 2.6 3. Download below… Read More »

Pyspark – getting started – useful stuff

Example to create dataframe from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext def create_dataframe(): “”” Example to create dataframe “”” headers = (“id” , “name”) data = [ (1, “puneetha”) ,(2, “bhoomika”) ] df = spark.createDataFrame(data, headers) df.show(1, False) # Output: # |id |name | # +—+——–+ #… Read More »