AWS DynamoDB – Write Spark (Scala) Dataframe to DynamoDB

By | January 3, 2019

  1. Create DynamoDB table
    Entity Detail
    Table Name pbm-simple-datatype-test1
    Primary partition key id (String)
  2. Create s3 bucket with name “temp-test-jsons”
  3. Upload below 2 json files there
    {"id": "111", "name": "Puneetha", "balance": {"balance_amount": 1000.59}}
    {"id": "112", "name": "Denis", "balance": {"balance_amount": -3089.40}}
    {"id": "113", "name": "Bhoomika", "balance": {"balance_amount": 3000.59}}
    {"id": "114", "name": "Swathi", "balance": {"balance_amount": 4000.59}}
  4. Read Json files to dataframe
    val input_df ="json").load("s3://temp-test-jsons/*")
    input_df.printSchema(), false)
  5. Write dataframe items to DynamoDB table
    val region = "eu-west-1"
    val output_dynamo_table = "pbm-simple-datatype-test1"
    import{ DynamoDB, Table, Item }
    // Create dynamoDB client
    val dynamoDBClient = AmazonDynamoDBClientBuilder.standard()
    // DynamoDB connection
    val dynamoDBConn = new DynamoDB(dynamoDBClient)
    // Regardless of how the dataframe is constructed,
    // Convert Spark Dataframe to Json Array
    val dfJsonArr = input_df.toJSON.collect()
    val dynamoTable = dynamoDBConn.getTable(output_dynamo_table)
    for (element <- dfJsonArr) {
      val item = Item.fromJSON(element)
  1. Stephen Boesch

    You’re doing a collect() on the driver! That’s completely not scalable.

    1. puneethabm Post author

      Agreed! This was used for stream which had less records in a batch. Wouldn’t recommend this as a scalable solution!


