Case Study

Auto

Our client was a comprehensive online automotive digital retailing and marketing platform that allowed dealers to manage their operations and marketing from a single location. Built on AWS infrastructure in the N. Virginia region, the project utilized Terraform templates to provision VPC infrastructure across multiple availability zones for high availability. The four isolated environments (dev, qa, beta, prod) were provisioned in separate VPCs with public subnets for services that needed to be exposed to the internet, and private subnets for data-related services like RDS instances and Redis, which were accessible only within the VPC. Bitbucket was used to store the source code, while Jenkins handled the CI/CD pipeline with groovy scripts configured as backup. All outbound rules of security groups were blocked to avoid any data leaks, and the team members worked on a forked repository with merge control on the master branch through pull requests. The Bitbucket Pullrequest Builder plugin was used to trigger builds from the forked repository, allowing developers to verify the code before merging to the master branch.

SonarQube was integrated into Dev build jobs to handle code quality and application security. Developers receive an email indicating the Code Quality gate status change and can review the code smells, security aspects, and vulnerabilities in the SonarQube dashboard. The project manager then reviews and approves the pull request for merging to the master branch. Bitbucket webhook events are used to configure QA builds to trigger when merges occur in the master branch. Protractor, an automation and end-to-end behavior-driven testing tool, is integrated into the QA build jobs. Protractor test reports are sent via email to the QA teams and uploaded to an S3 Bucket for later reference. AngularJS is used for developing frontend web applications, while NodeJS is used for developing backend services. After passing the Protractor testing, manual testing is performed by the QA team, and beta tags are created by the project manager. The beta tags trigger builds in beta environments for final review and testing. The confirmed version of the beta is then tagged according to production naming conventions. Production builds are manually triggered using production tags, and rollbacks of production code are handled using these tags. Finally, a multi-factor authentication service is implemented for AWS user logins, secured with a one-time password with a limited lifetime.

Users receive login email alerts for each login, and password policies are in place to meet complexity requirements and mandatory rotation periods for IAM user passwords. Lambda functions integrated with API Gateway are used based on the scenario, and they are deployed using cloudformation and serverless. DynamoDB is used as the NoSQL database, while Redis is used for caching data. Elasticsearch is utilized for analytics purposes, with Kibana monitoring the Elasticsearch clusters. MySQL RDS instances are provisioned in corresponding VPCs to store user data, and those in production environments are configured with Multi AZ for high availability. RDS instances are fine-tuned to match network traffic, with read replicas placed in different regions and availability zones to enhance read queries. Finally, Route 53 health checks are configured to route network traffic based on load.

Bash scripts and Python were used to fulfill specific automation requirements. Frontend sites were hosted in S3 buckets, with CloudFront used as a CDN to cache the website globally. Cloudfront Identity was utilized to prevent unauthorized public access to the S3 bucket, while Cloudfront helped to serve website content faster. We achieved high availability for various components of the application by scaling backend services. Jenkins containerized backend services, pushing the resulting Docker images to the ECR (Elastic Container Registry). Project-based Matrix authentication was employed in Jenkins to bolster security by preventing unauthorized access. ECR images were tagged based on the application version and retained for rollbacks during critical situations. Docker images were pulled and deployed as ECS services, with cluster autoscaling and service autoscaling configured via Cloudwatch alarms and autoscaling groups. Backend services were deployed in ECS and autoscaled using AWS autoscaling groups configured based on network throughput, CPU, and memory usage of the application.

Backend services were monitored using Cloudwatch and integrated with SNS for sending notifications. To manage the SSL requirements and HTTPS redirects, we used an Applications Load Balancer integrated with ACM. We also automated the automatic provisioning of the frontend websites for newly onboarded dealerships using shell scripting. To store database credentials and other variables, we used AWS Parameter Store, which was encrypted using custom-managed KMS keys. During deployment, the backend services were configured to decrypt and fetch the credentials from the parameter store. Encryption was implemented in all components, including RDS databases, S3, and EBS volumes, using custom-managed KMS keys. S3 buckets were used for file storage, and bucket access was restricted to the VPC. Jenkins was integrated with Slack channels, and notifications were sent to these channels in case of success or failure. For data warehousing and large-scale dataset storage and analysis, we used Amazon Redshift. We also utilized Kapow, a robotic automation tool, to grab images and store them in S3 buckets. ETL jobs were automated using Jenkins and shell scripts. To monitor application stability and error tracking, we used Bugsnag, while NewRelic was used as a third-party monitoring solution to monitor and alert us based on the application endpoint performance. We used the Apdex goal feature in the NewRelic dashboard to maintain application performance and consistency. For authentication, we used Auth0, an easy-to-implement and adaptable authentication and authorization platform. To improve search queries, we used Algolia, a powerful hosted search API that provides resources and tools for creating fast and relevant searches. Finally, we used Jira for project management ticketing.

 

Challenges.

 

  1. We encountered problems with read queries to the databases, even after performing high volumes of simultaneous read and write operations. Despite attempts to increase the IOPS and upgrade the RDS instance type, the issue persisted.

Solution:

To improve database performance, we created read replicas in the same region and also provisioned an additional RDS instance with read replicas in another region. AWS Route 53’s private hosted zone and health checks were used to route traffic between these databases. We observed improved performance with read queries while write queries were affected by the reads.

 

  1. The per month cost of running the entire infrastructure as 4 environments were considerably high. We need to reduce the running cost without affecting the performance.

Solution:

We analyzed the entire infrastructure and listed out the components that don’t require an upgrade in the near future. We purchased 1-year reserved instances for those components and were able to reduce the cost to a certain extent. We used shell scripts to automate  scheduled shutdown of the Dev and QA resources during the night time and on weekends when it was unused. Optimized all resources based on the current usage and also removed unwanted resources that were incurring costs. We were able to make a huge saving on the per month running cost.

 

Tools/Technologies

  • Cloud Platform
  • AWS
  • Infra Provisioning Tools & Configuration Management
  • Terraform
  • CloudFormation
  • Aws-Cli
  • Source Code Management
  • BitBucket
  • Continuous Integration & Deployment
  • Jenkins
  • Databases
  • Redis
  • MySQL
  • DynamoDB
  • ElasticSearch
  • Redshift
    1. Containerization & Deployment
  • Docker
  • ECS
  • Serverless Deployment
  • Lambda
  • Cloudformation
  • REST Api Management
  • AWS API Gateway
  • Message Queuing
  • SQS
  • Authentication
  • Auth0
  • Search optimization
  • Algolia
  • Encryption
  • AWS KMS
  • Secret Storage
  • AWS Parameter Store
  • Process automation
  • Kapow
  • Shell Scripts
  • Logging, Monitoring & Alerting
  • NewRelic
  • AWS Cloudwatch
  • Kibana
  • Code Testing
  • Protractor
  • SonarQube
  • Load Balancing
  • AWS ALB
  • Content Delivery Network
  • AWS Cloudfront
  • Web Hosting
  • AWS S3
  • AWS Lightsail
  • DNS Management
  • AWS Route 53
  • Storage
  • S3
  • EBS
  • Virtual Private Network
  • AWS VPN Gateway
  • Collaboration and Ticketing
  • Jira
  • Slack