AWS DynamoDB – Write Spark (Scala) Dataframe to DynamoDB

By | January 3, 2019

AWS DynamoDB – Write Spark (Scala) Dataframe to DynamoDB

 

  1. Create DynamoDB table
    Entity Detail
    Table Name pbm-simple-datatype-test1
    Primary partition key id (String)
  2. Create s3 bucket with name “temp-test-jsons”
  3. Upload below 2 json files there
    1.json
    {"id": "111", "name": "Puneetha", "balance": {"balance_amount": 1000.59}}
    {"id": "112", "name": "Denis", "balance": {"balance_amount": -3089.40}}
    
    2.json
    {"id": "113", "name": "Bhoomika", "balance": {"balance_amount": 3000.59}}
    {"id": "114", "name": "Swathi", "balance": {"balance_amount": 4000.59}}
    
  4. Read Json files to dataframe
    val input_df = spark.read.format("json").load("s3://temp-test-jsons/*")
    input_df.printSchema()
    input_df.show(5, false)
    
  5. Write dataframe items to DynamoDB table
    val region = "eu-west-1"
    val output_dynamo_table = "pbm-simple-datatype-test1"
    
    import com.amazonaws.services.dynamodbv2.document.{ DynamoDB, Table, Item }
    import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder
    
    
    // Create dynamoDB client
    val dynamoDBClient = AmazonDynamoDBClientBuilder.standard()
         .withRegion(region)
         .build()
    
    // DynamoDB connection
    val dynamoDBConn = new DynamoDB(dynamoDBClient)
    
    // Regardless of how the dataframe is constructed,
    // Convert Spark Dataframe to Json Array
    val dfJsonArr = input_df.toJSON.collect()
    
    val dynamoTable = dynamoDBConn.getTable(output_dynamo_table)
    
    for (element <- dfJsonArr) {
      val item = Item.fromJSON(element)
      dynamoTable.putItem(item)
    }
    
     
     
  6.  
     
Category: AWS

2 thoughts on “AWS DynamoDB – Write Spark (Scala) Dataframe to DynamoDB

  1. Stephen Boesch

    You’re doing a collect() on the driver! That’s completely not scalable.

    Reply
    1. puneethabm Post author

      Agreed! This was used for stream which had less records in a batch. Wouldn’t recommend this as a scalable solution!

      Reply

Leave a Reply to puneethabm Cancel reply

Your email address will not be published. Required fields are marked *