Intro to Kafka - Producer & Consumer

Kafka is one of the most popular distributed open-source event streaming platforms. Kafka was designed specifically to handle the continuous flow (i.e. stream) of data in a centralized way. Being a distributed system, just like databases, Kafka also relies on a distributed commit log.

Read More

AWS Glue Internal Working

AWS Glue is one of the major offerings of AWS, where AWS provides support to run ETL (Extract-Transform-Load) jobs reliably. The main objective of AWS Glue is to support different query processors to be used as analytical engines on top of different types of data stored on different types of storage with a variety of formats and that too in a serverless way. Here in this blog post, we will mainly focus on how AWS Glue has managed to do so, especially processing data stored on different storages in a variety of formats.

Read More

Introduction to NoSQL Databases

Databases are one of the most used components of any web service. It’s safe to say that all the web services serve the requests by processing the data fetched from the databases, and by storing the updated information on there. Even though it’s used as persistent storage, the primary benefit of a database lies in the efficient retrieval of the stored data. With the current generation of databases, they can be broadly categorized into RDBMS and NoSQL. Here in this blog post, we will mainly try to get an overview of the NoSQL database, how it started getting popularity, its use case, and its drawbacks. So off we go…

Read More

Introduction to Bitcoin Protocol

Bitcoin is one of the most popular cryptocurrencies. However, we’re more interested in Bitcoin, as it’s one of the first peer-to-peer electronic cash systems proposed without the use of any trusted entity. Bitcoin uses a blockchain where the transactions are blocked together and a chain of such blocks is maintained.

Read More

Amazon Aurora DB Design Considerations

In this blogpost we would discuss about the design considerations taken for Amazon Aurora DB, a fully-managed scalable relational DB service on top of MySQL and Postgres compatible engine. We would go through the history of supporting Databases on the AWS platform and how it evolved to Aurora DB. It would provide us a recent example of how the design principles we’ve explored earlier got into pieces to support managed scalable service. The selling point of this architecture is that it handles 77x more load compared to traditional replicated DB systems.

Read More

Frangipani - A Scalable Distributed FileSystem

Frangipani is one of the intial attempts to create a scalable distributed file-system. This was primarily designed for the earlier generation of computers, which was primarily used on time-sharing basis, where the system is connected by multiple terminals, and different users can access the same file-system through these terminals. Even though the pattern of how systems are getting used is completely different, this Frangipani file-system was designed with certain techniques, which has paved the way to the distributed transactions.

Read More

Properties of Distributed Transaction

Each programming language provides support of threading to improve performance of application by concurrently running multiple thread concurrently. However this can also result in incorrect results in case multiple transactions with overlapped objects are performed in parallel. This is solved with the help of locks and semaphore. It is to be noted that even though the working mechanism of lock and semaphore is similar, lock only allows one particular thread to proceed, whereas semaphore restricts the concurrent processing to a specific number of threads. All this are supported by all the major programming languages, and they work well for the applications running on single node. However due to distributed nature of databases, transactions need to communicate with multiple worker nodes where the actual data lies, and co-ordination is needed among them to handle the overall transaction processing. Here in this particular blog we would discuss different aspects of distributed transactions.

Read More

Chain Replication

The majority of distributed systems rely on consensus algorithms for fault tolerance. There’s another simple approach to designing distributed systems through chain replication, which supports high throughput and availability without sacrificing strong consistency guarantees.

Read More

Intro to Zookeeper

Apache Zookeeper provides highly reliable distributed co-ordination service. It helps maintain configuration information, synchronize distributed processes, and provide group membership services. Till now we’ve explored the Raft consensus algorithm, which is a low-level library to provide consensus for operations. Zookeeper is a well-known application built on top of it. Here in this blog post, we will learn the basic usage of Zookeeper along with its internal design and some performance aspects.

Read More

Testing Distributed Systems for Linearizability

Designing a distributed system poses a significant challenge, as need to keep track of a lot of constraints including but not restricted to consistency, fault-tolerance, and performance expectations. However, once such a system is designed, it raises a question about proving the correctness of the system. To provide a foolproof method of correctness, formal methods are widely used, however writing specifications for the overall distributed system requires dedicated effort, as formal methods are more of mathematical models, which is equivalent to the coded models, and needs expertise in formal method specification. The use of formal methods is the best approach in terms of finding some subtle bugs beforehand. In opposing this, another low-effort way is to use linearizibility to test the system.

Read More

Fault-Tolerant Virtual Machines

While designing large scale systems, we would need to take care of disaster scenarios as well, so that in case one of the primary server goes down, the bakcup server can take it’s place. The disaster recovery handling services can be broadly catergorized into to types- Warm DR, where the backup server is fully functional along with the primary server, and the cold DR, where the backup server is started and configured only when there’s some disaster strikes. Here in this blog post, we would mainly explore a research paper which has proposed a fault-tolerance mechanism for virtual machines, called VMWare FT.

Read More

Distributed Filesystem Design

In the last week we’ve explored a distributed file system, Google File System (GFS), where we’ve got an overview of how the chunks are managed from a master node and distributed over a set of chunk-servers. However, that only provides a thousand foot view, and not the internal details. Here in this blog post, we would explore the lower level details of a file-system, such as how the directory structure is managed, and how the data is replicated across different systems.

Read More

Introduction to Google File System

Google File System (GFS) is one of the first attempt to create a distributed file system built with fault tolerance in mind. The file system is expected to run on top of commodity hardware, and hence there’s a gurantee that there would be some system in the network which wouldn’t respond, and some systems might not even recover. GFS is the underlying file system for many different products, including the map-reduce designed by Google, which makes this GFS a file system which handles a major load. The paper mentiones the GFS has already supported a load of billions of objects with size around couple of KBs.

Read More

Introduction to Map-Reduce

MapReduce is a programming model designed to process large amounts of data distributed across thousands of systems. The issues related to distributed computing like how to parallelize the computation, distribute the data, and perform fault tolerance are abstracted from the programmer.

Read More

CAP Theorem

CAP Theorem is an interesting theorem proposed by Eric Brewer, which states that any distributed system can provide only two of the following three guarantees:

  1. Consistency: Every read request should return the most recent write or an error. Please note that there’s a subtle difference in the term consistency as defined in the CAP theorem and ACID properties of RDBMS. In the case of ACID properties of RDBMS, consistency means the integrity of data should be maintained before and after transactions. There shouldn’t be new data created.
  2. Availability: Every request receives a non-error response, however, the response doesn’t guarantee the most recent data
  3. Partition Tolerance: The system continues to operate despite an arbitrary number of messages getting dropped or delayed by the network between the nodes
Read More

HTTP Cookies

HTTP Cookies are small pieces of information generated by the web server to store them on the user’s browser. The browser then in turn sends the cookies for each request to the same domain.

Read More

Scaling on Queued Messages

Queue-based systems are one of the most prominent ways of propagating events and notifications. Due to the widespread usage, the majority of the cloud service providers also have offerings of enterprise-grade queueing mechanisms. Apart from that many organizations also have their internally hosted queue applications like Kafka, and ActiveMQ, which multiple services utilize. The primary concern from the perspective of designing distributed systems with a queueing mechanism is when there’s a requirement to absorb the burst of messages without delay. The worker nodes processing the messages are needed to scaled without delay.

Read More

Designing Distributed Systems - Batch Computational Patterns

In the last couple of blog posts, we’ve explored patterns of distributed systems, and the majority of those pattern works on top of long-running server applications. In contrast, the batch processes are expected to handle large volume data for a short span of time. The batch processes are expected to be active for a very small amount of time, and perform a repetative task on a regualar interval. There are couple of batch processing patterns like MapReduce, which has spawed an industry itself. Here in this blogpost, we would explore batch computational patterns like this, and would also get hands on experience by building prototypes.

Read More

Designing Distributed Systems - Ownership Election

In the last couple of blog posts, we’ve explored serving patterns, where distributing requests in terms of requests per second, or the data size getting loaded, or the time to process. Here in this blog post, we would explore scaling in terms of assignments, to determine which task owns a specific resource. For a single application, well-established in-process lock to ensure only one actor is owning a resource at a particular moment, however this becomes complicated in distributed systems due to inherent nature of such systems. In distributed system, where multiple server can access same resource simultaneously, in certain scenarios there’s a need to determine which one owns the resource, and can perform restrictive operations without hinderance of other nodes.

Read More

Designing Distributed Systems - Scatter-gather & FaaS with event-driven pattern

In the last blog post, we have discussed two of the major patterns for designing distributed systems. The replicated load-balanced pattern scales the system in terms of requests processed per second and the sharded data pattern scales the system in terms of size of data. Here we would discuss two more such patterns: the scatter/gather pattern which helps scale the system in terms of computation time needed and the event-driven pattern.

Read More

Designing Distributed Systems - Replicated & Sharded Patterns

In the last blog post, we discussed about the generic patterns used to create co-scheduled containers. However, that is only a small portion in designing distributed systems. With the help of container orchestrators, and the API contract between microservices that defines a clear surface area which the microservices agrees upon, the microservices are scaled rapidly across nodes. Here we will discuss two of the most used serving (i.e. multi-node) patterns- replicated load-balanced services and sharded services.

Read More

Designing Distributed Systems - Single Node Patterns

Containers and container orchestrators have introduced a great deal of flexibility in designing distributed systems. Earlier the programs were developed to be distributed over a handful of nodes, however, with the help of container orchestrators, scaling has become a quite straightforward activity, and within a couple of seconds, the program can be scaled to ten of thousands of instances. And with this, certain specific patterns and practices, which has found usage across organizations. Identifying such patterns provide a common vocabulary to discuss repetitive set of problems, and encourage reuse. Here in this blog post, we would discuss the single node patterns, where multiple containers are needed to be co-scheduled in the same node, with the understanding that the containers need to share resources between containers. Different orchestrators have different names for this type of tightly grouped container. In Kubernetes, it’s called a Pod, other orchestrators also have native support, though the term would be different.

Read More

Amazon DynamoDB - Architecture

DynamoDB is a fully-managed serverless NoSQL database service provided by Amazon to provide consistent performance across any scale. What makes DynamoDB truly unique is the advertised single-digit millisecond latency withstanding issues like traffic imbalance, monitoring, and automated system-related operations. Here in this blog post, we would explore the architecture of DynamoDB, would understand where this

Read More

Amazon DynamoDB - Distributed Transactions

Amazon DynamoDB is one of the most prominent fully-managemed NoSQL database service offering, which advertises a predictable performance with high availablity and high scalability. Recently DynamoDB has added the support for distributed transactions with the help of timestamp ordering protocol. The transactions are atomic in nature, and combining isolation with this ensures that developers don’t have to worry about concurrent requests or any partial execution of transactions. In this blog post, we would explore on the technique that Amazon has employed to support transactions on DynamoDB.

Read More

Introduction to Behavioral Design Patterns

In the earlier blog posts, the creational and structural design patterns were explored, and here we would explore the behavioral design patterns. Behavioral design patterns are concerned with algorithms and the arrangement of responsibilities among different sets of classes and objects. The primary objective here is to have the interaction distributed over different sets of objects in such a way that they can communicate with each other while still being loosely coupled.

Read More

Introduction to Structural Design Patterns

In the last blog post, we’ve explored on the creational design patterns. In this blog post, we would explore a different set of design patterns: the structural ones. The structural design patterns are concerned with how classes and objects are composed to form even larger structure. The primary objective here is to compose the components in a flexible and extensible way, so that change can be made in specific parts of the structure without changing the entire structure.

Read More

Introduction to Creational Design Patterns

Creational design pattern handles the object instantiation process. They provide an abstract way to make the client independent of how objects are instantiated, which objects are getting instantiated, and how they’re represented. For smaller applications, the objects are instantiated in a hard-coded way, as the type of object has remained fixed. However, as the application gets evolved, there comes the requirement of the same application or client to accommodate different sets of objects to be handled similarly, and thus the need of defining a smaller set of behavior arises, which then can be extended into different types of objects. Here the creational design pattern tries to achieve two goal:

  1. To encapsulate the knowledge of which concrete classes the application uses
  2. How the instances of these concrete classes are created and composed
Read More

Design Patterns Case Study

Design patterns are the blueprints of commonly encountered problems in software design and architecture. Each pattern provides a template tailored to solve a specific type of problem with a specific intent. Here, in this blogpost, we would explore on some of the preliminary design patterns with the intention of creating a WYSIWYG(What-You-See-Is-What-You-Get) editor. It’s to be noted there’s a complete different set of patterns for different aspects of programming like concurrent, distributed programming and real-time programming. Design pattersn from object oriented programming perspective provides a way of communicating classes and objects that are customized to solve a gneric design problem in a particular context.

Read More

Introduction to AWS Secrets Manager

AWS secrets Manager is a secrets management service, whose primary goal is to store a secret securely, and to provide ways to retrieve and allow authorized users to rotate the secret. This is how applications are enabled to retrieve the secrets at runtime rather than having them hardcoded in the codebase or deployment time, thus improving the security aspect. There’s different type of secrets an application can use: application specific keys, database credentioals, OAuth tokens, API keys are some of them, and each one of them can be stored on Secrets Manager in the form of key-value pair.

Read More

Introduction to Hash Table Internals

The hash table is one of the most important data structures, which is used for fast insertion, retrieval, and deletion of key-value pairs. We get introduced to this Hash data structure pretty early in our engineering journey, however, the internal designing of the hash table is quite entangled with multiple designs to choose from depending upon application-specific behavior. The full extent of the hash table implementation may be grasped by the fact that even the programming language runtime also depends on it, as it’s quite extensively used for classes and member attribute lookup for languages supporting OOPS, and variable lookup table in case of procedural languages. In this blog post, we would explore the fundamental concepts behind the hash table, how collisions are resolved, performance, and certain optimizations.

Read More

Hashicorp Vault

With the rapid adaption of microservice architectural patterns, one of the major issues has arisen related to secret management. With the microservices pattern, teams have become quite independent, and they interact through a set of well-defined endpoints, and apart from the API contract, the service is treated as a black box by the downstream services. As each team becomes independent, the independent nature is also reflected in different aspects: starting from selecting the tech stack to design patterns used. And thus it increases the complexity of following a uniform core-security principle. And one major concern regarding this is the leaking of security credentials. Storing security credentials on the codebase or setting them as environment variables from the deployment pipeline, even though seems to allow teams to quickly build features, however, should be highly discouraged. To solve this kind of credential sprawling, an enterprise-level centralized credential management system is needed, and Hashicorp provides one such solution with Hashicorp Vault.

Read More

Consul Overview

With the advent of microservices, the networking got more complicated. Earlier, with the monolithic approach, the vertical scalability was pretty much simplistic. The applications were viewed as 3-tier applications and a majority of the traffic flow was north-south, i.e. majority of the data flow consisted of client requests and responses, which may be in the form of browser requests or may be in the form of direct API requests. Event the networking configuration was also simpler due to the nature of these 3 tiers, where the network flow was allowed from the downstream layer to the immediate upstream layer, and there a load balancer is usually placed, which works as a reverse proxy server to the external users.

Read More

Raft Consensus Algorithm

One of the foundation problems of designing a distributed system is how to store some data in a distributed system. One basic approach is to share or duplicate all the data between the nodes. However the complexity arises when multiple nodes can operate independently, and in case some node goes down the complexity arises by multiple folds. In this blog post, we will learn about Raft: one of the most commonly used consensus algorithms. We would briefly touch on the generic replication technique of the database replication as well.

Read More

Nginx Configuration

NGINX is one of the most used web-server on the Internet. Besides being a web server, NGINX can be used as a proxy server, load-balancer, and cache as well. In this blog, we would explore such use cases of NGINX. We would not only explore the theory behind these, but we would look from the configuration perspective as well.

Read More

Video Streaming Protocols

Video streaming has become one of the most popular forms of consuming content on the Internet. Because of this, even though the Internet was started as a method of transferring textual data, the audio-visual segment is currently responsible for the majority of Internet traffic. To handle such traffic efficiently different protocols for video streaming are developed. In this blog, we will learn about some video streaming protocols.

Read More

Introduction To Vpc

In this blog post, we’re going to learn about VPC, a basic building block of creating a network infrastructure in cloud environments. Even though we’re going see some examples specific to the AWS platform, this is supported by all the major cloud service providers, including the terminology and nomenclature, however, for sure, depending upon the cloud service providers, there are a couple of restrictions imposed and conditions uplifted here and there. The basic understanding of VPC remains the same across all the platforms.

Read More

Introduction To Authorization

Authentication and authorization is one of the corner stones of designing services. As both the terms used together, it gives the idea that both these two terms are exactly same. However it’s far from the truth; theoretically they handle two different aspect of security. Authentication verifies the user identity, however, authroization validates the access level. This blog post provides an introduction on the authorization portion, an overview on OAuth, one of the most popular authorization mechanism available today.

Read More

Asynchronous Communication

The most common way to design loosely coupled components in microservice architectures is to expose a well-designed standardized set of APIs, which will be invoked by the upstream service. REST APIs are generally designed with a synchonous communication approach, where the communication between the services are real-time, and the upstream caller expects reponse for each and every requests triggered.

Read More

Web Api Design Best Practices

In this blog post, we are going learn about RESTful API design, a principle widely adopted by microservices to design interfaces so much that it became the de-facto standard. Here we will learn basics about REST, standard practices, and how to utilize REST to create an intuitive interface that clients can consume in a meaningful way.

Read More

Introduction To Micoservice Architecture

In this blog post, we will learn about microservices. We would learn why microservices are evolved and are preferred nowadays over monolithic architecture. We would also learn about a few patterns and useful ideas on microservices.

Read More

Interesting Concepts Of Docker

This blog post is written to jot down several concepts of Docker, which are generally not needed for day-to-day activities for the majority of the tasks, however, these concepts provide an overview of the working principles of Docker.

Read More

Introduction To Docker

Docker provides an platform to build and deploy application on an isolated environment. The main technology behind the docker is containerization, which dictates that all the containers are self-sufficient run-time of the application, having it’s own filesystem, network stack. The post here covers the fundamentals of docker, the goal here is to understand the basics, to have some experience with the docker client to create, run, inspect containers.

Read More

Introduction To Go

Go/Golang is developed and supported by Google to obtain high-performance by having the static typing just like C, alongwith introducing simpler way for concurrent programming to achieve high bandwith throughput. As a result Golang became easier to grasp for a newcomer, and the standard libraries took care of basic http request-response model. More than 80% of codebase for Docker and kubernetes written in Go which popularized it, and Go became the de-facto choice for developing services and products on top of Docker and Kubernetes.

Read More

Introduction To Kubernetes

Kubernetes provides an option to manage containerized workloads. In this document, we will study the components of Kubernetes to get an overview of it, and how it enables us to serve a large number of requests.

Read More

12 Factor App

Twelve-Factor App provides an overview of how a modern distributed system is supposed to work. It provides guidelines to standardize the process to ensure the distributed nature of the application.

Read More