Recent Posts

Recent Comments




I recently had the opportunity to develop a Proof of Concept (PoC) for an idea whereby data is available from PDFs and Excel files to be consumed and analyzed into a responsive web application running in the AWS cloud. The scope included development of a responsive UI (User Interface) to reflect the idea, back-end APIs supporting the user interface, database storage, a data loading process, and a Continuous Integration/Continuous Deployment (CI/CD approach deploying it on the cloud. I will be sharing some details about the architecture, techniques, tooling, libraries and code snippets used in this effort.

Development: ATOM IDE, AWS SDK for NodeJs , AWS CLI

Source code Repository: Bitbucket

Language: NodeJS with NPM

AWS Services used:

Data extraction and loading: AWS Step-functions, AWS Lambda, Amazon S3, Amazon DynamoDB

Back-end: Amazon API Gateway, AWS Lambda, Amazon DynamoDB

Front-end: Amazon EC2, Amazon EC2 Auto scaling, Elastic Load Balancing, Amazon Route-53

CI/CD: AWS CodePipeline, , AWS CodeCommit, AWS CodeBuild, AWS CloudFormation, Amazon S3

Amazon Elastic Container service(ECS), Amazon Elastic Container Registry(ECR)

Data extraction and loading component: The raw data was available in PDF, Excel files and node.js used in conjunction with AWS Lambda and step-functions.

  1. Data Loading from excel – exceljs node library was used.

Some key statements presented here to process spreadsheets:

var Excel = require(‘exceljs’);

var wb = new Excel.Workbook();

var filename=”filename.xlsx”;

wb.xlsx.readFile(filename).then(function() {

var ws=wb.getWorksheet(“Sheet1”);   // Read the sheet

// Iterate over all rows that have values in a worksheet

ws.eachRow(function(row, rowNumber) {

// Iterate over all cells in a row (including empty cells)

row.eachCell({ includeEmpty: true }, function(cell, colNumber) {

console.log(cell.value)        }   }   }


  1. Data extraction from pdf – pdf-text node library was used.

Some key statements presented here to process PDF files:

  • When PDF file is stored in object storage S3

var pdfText = require(‘pdf-text’);

var AWS = require(‘aws-sdk’);

       var s3 = new AWS.S3();

       var params = {Bucket: ‘bucketfolder.pdffiles’, Key: ‘filename.pdf’};

       s3.getObject(params, function(error, data) {

       var buf = new Buffer(data.Body, ‘binary’);

       pdfText(buf, function(err, chunks) {

       chunks.forEach(function(value) {

              console.log(value);                });   })        });

  • When PDF URL is directly accessed from website location:

var pdfText = require(‘pdf-text’);

var request = require(“request”);

var url= https://validurlvalue.pdf   //Substitute with the Web URL pointing to the pdf file

  request({url: url, encoding: null, strictSSL: false

  }, function (error, response, body) {

      if (!error && response.statusCode === 200) {


            pdfText(body, function(err, chunks) {

              chunks.forEach(function(value) {


  }); });   }  } });

  1. Target DB: DynamoDB

The following shows some of the justifications behind choosing DynamoDB database:

  • Data to be stored needs to have flexibility to store and retrieve dynamic data, different attributes for different rows, metadata for storing images, web page URL references leading to a NoSQL database that is easily scaled and meets low latency requirements.
  • A fully managed database that offers built-in features such as durability, availability, replication and scalability features.
  • The ability to have the complete solution serverless in the future by using this serverless DB.
  • Proven both in the enterprise solutions and in our prior projects.
  1. Workflow using Step-functions and Lambda:

There are varieties of data that needs to be loaded from Excel and PDF to the database and needs to be periodically updated checking the data source and in specific sequence. The individual data extraction and loading components are implemented using AWS Lambda functions while the AWS step functions provided simplified and natural extension to orchestrate Lambda to manage the workflow involving sequential and parallel nature of the loading and extraction process. This is part of the AWS Serverless platform services to promote the full managed and completely serverless solution.

UI Layer:

  1. Rational and advantages behind choosing React.JS for User interface development:
  • UI required a highly responsive behavior including Tables and Tree structures requiring changes in one field automatically changing some other fields including formula and calculations.
  • Component-based development and reusable components from the React JS libraries and from rapidly evolving community development.
  • Framework that supports the powerful state management features that could be leveraged in both UI interactions or requiring a round trip to a backend API or DB call seamlessly providing the user interface experience blending with the React components.
  • Successful adoption by many enterprises and prior experience with other projects in the company.
  • Same framework that can support both web application and mobile apps as sophisticated as FB, Instagram
  1. Components and packages:

React 15.5.4(react, react-dom, react-scripts)

Less Dynamic stylesheet language (“less”: “^2.5.3″,”less-loader”: “^2.2.1”)

.jsx(or .js) file defined with components with building block for the different

  • Presentation components like header, footer, node, node row, any sort of content smaller components or at higher levels depending on the reusability perceived.
  • Interaction components such as node click, expand node, collapse node
  • Backend interaction components such getData, saveData, calculateData that can invoke the required APIs or Backend

Basic steps:

// CLI tool to get started with React

npm install -g create-react-app

create-react-app app-name

cd app-name

npm install   (To install the dependencies)

We intend to leverage storybook to browse a component library, view the different states of each component, and interactively develop and test components. Initial steps to install the project will be:

npm i -g @storybook/cli

cd my-react-app


Packaging was accomplished through webpack for bundle JavaScript files for usage in a browser. While developing, we tested the application locally by interacting with the Backend layer without having the need to deploy to the server through the CI/CD process.

Backend/API layer:

Back-end APIs are required to support the UI layer is enabled through API Gateway, AWS Lambda, and DynamoDB.

The creation and integration of backend resources is achieved through AWS Mobile CLI to expedite the development of the required AWS components. This project uses the AWS Amplify JavaScript library to add cloud support to the application.

We used a separate project for the backend APIs . Here are the prerequisites:

It includes Installing AWS mobile CLI, Configuring AWS credentials and Installing React native project.

npm install -g awsmobile-cli

awsmobile configure

npm install -g create-react-native-app

Create the project and execute init command for the backend project for your app:

create-react-native-app BackendProject

cd BackendProject

awsmobile init

awsmobile cloud-api enable

awsmobile cloud-api configure

NOTE: If you go to the AWS Mobile Hub, you can see this project. However, the AWS Mobile Hub is required only to support the mobile clients while leveraging the API Gateway and Lambda functions generated as part of this process.

To create new APIs and path (/items), use the following command which can also be used to edit APIs to manage different paths.

awsmobile cloud-api configure

  • Select from one of the choices below. # Create a new API
  • API name # abtestAPI (name can be anything)
  • HTTP path name # /items (Path can be changed to whatever)
  • Lambda function name (This will be created if one does not already exist)
  • Add another HTTP path # No

NOTE: This will create the required path and actions (GET, POST,PUT, DELETE…) under the awsmobilejs/backend folder. The app.js can be modified with the required backend database access or business logic code.

Save and push to cloud.

awsmobile push

This will create the required API in API Gateway, Lambda function, and also in the MobileHub-> cloudLogic featuring the APIs. It can be tested there as well.


This node example can be used to test the APIs launched in the API Gateway:

NOTE: aws-exports.js can be referred to find the parameters in the reference code

var apigClientFactory = require(‘aws-api-gateway-client’).default;

config = {invokeUrl:’’}

var apigClient = apigClientFactory.newClient({


accessKey: process.env.AWS_ACCESS_KEY_ID,  secretKey: process.env.AWS_SECRET_ACCESS_KEY,

region: ‘us-east-1’, // OPTIONAL: The region where the API is deployed

systemClockOffset: 0 ,// OPTIONAL: An offset value in milliseconds to apply to signing time

retries: 4, // OPTIONAL: Number of times to retry before failing. Uses axon-retry plugin.

retryCondition: (err) => { // OPTIONAL: Callback to further control if request should be retried.

return err.response.status === 500;   } ;           });

var params = {

//This is where any header, path, or querystring request params go. The key is the parameter named as defined in the API

//userId: ‘1234’,


// Template syntax follows url-template

var pathTemplate = ‘/items’

var method = ‘GET’;

var additionalParams = {


var body = {

//This is where you define the body of the request


apigClient.invokeApi(params, pathTemplate, method, additionalParams, body)

.then(function(result) {console.log(result);

}).catch( function(result){console.log(result); });

The recommended approach to interact from UI is by using aws-amplify for cloud services.

import Link from ‘link-react’;

import { Table } from ‘semantic-ui-react’;

import awsmobile from ‘./configuration/aws-exports’;

import Amplify,{API} from ‘aws-amplify’;



To make calls to the API Gateway through AWS Amplify, you need your IdentityPoolID in aws-exports.js. For further documentation, refer to AWS Amplify Modify the App component like

class App extends Component {

state = {   data: []  }

fetch = async () => {

this.setState(() => {

return {  loading: true }



.then(resp => {


data: resp      });

console.log(“response is : “, resp);  })

.catch (err => console.log(err))

}    }

//[] array variables will reflect the return data

CI/CD approach:

Repository – Bitbucket repository is used for managing the application source code. In addition, the CloudFormation template files and the configuration files required to integrate with AWS platform are also maintained to accomplish managing the infrastructure as code.

Bitbucket offers an integrated CI/CD environment with Pipelines that can automate the code to build, test and deploy processes to manage an entire workflow from taking checked in code to deployment into target environments. Since our target Build, Test and Deployment platform is AWS,  AWS CodeCommit, CodeBuild & AWS CodePipeline CI/CD services are leveraged to accomplish the application and infrastructure updates in a faster, more reliable, stable and native manner. We leverage Bitbucket Pipelines to push code to AWS CodeCommit per commit to make the integration between Bitbucket and AWS seamless to the developer.

A “Templates” folder is maintained in the Bitbucket repository with the cloud formation template stacks.

BitBucket pipeline(Configure bitbucket-pipelines.yml) is configured with the following tasks to integrate with AWS:


  • Push the updated CF Templates in the templates folder to S3
  • Creates or Updates CloudFormation stack for CodePipeline to guarantee the Pipeline is current, before CodePipeline is initiated by its Source(CodeCommit) is updated.
  • Push the code from the Bitbucket repository to AWS CodeCommit


NOTE: AWS CodeBuild supports Bitbucket as a source, however CodePipeline does not. That’s why the AWS CodeCommit is the option we chose, as opposed to Github or S3.

AWS CodePipeline integrates with CodeCommit, AWS CodeBuild and code deployment using CloudFormation to avail the application.

AWS CodeBuild compiles the source code using Docker and creates the output file for deployment. webpack build is included to the Docker codebuild, so CodeBuild will generate new files every time.

Infrastructure deployment occurs by using a CloudFormation script to create the required infrastructure, after which the code is run with an ECS container(Fargate). The application can be started or restarted with each successful commit/build/test/deployment with ECS pulling and running the specific image that was created during its specific run of the pipeline.

The major advantage with this architecture in place is it allows us to go completely serverless by simply modifying the CI/CD process in place. Currently S3 is used only as the storage to support other services. In the event of going fully serverless, S3 will  host the front-end website with AWS CloudFront, and the same backend API Gateway/Lambda/DynamoDB replacing ECS from the current solution.

This article should give you some useful ideas for architecting and developing a PoC or an application spanning several layers, tools, tips and techniques in the AWS platform.

References :

Getting started with react

Storybook on Github

AWS Amplify on Github

AWS Mobile React sample

Currently a CTO Architect with Sungard Availability Services, I have over 28 years of IT experience including full life cycle application delivery, consulting services, Technical project management and solutions architecture. As a cross sector cross platform architect had exposure to many IT technologies through its evolution and worked with a number of enterprise clients.


There are no comments.