Architecture Creation


It is a measure of system uptime and is defined by the the proportion of time that a system is functional and working.

Recovery Time Objective (RTO): the amount of time required to restore business processes to a specific level of service after any disruption or disaster.

Affected by:

  • System errors
  • Infrastructure problems
  • Malicious attacks
  • System load

High Availability Best Practices

  • Clustering
    • Deployment Stamps: Deploy multiple independent copies of application components, including data stores.
    • Geodes: Deploy services into a set of geographical nodes, each of which can service any client request in any region.
  • Backup and recovery strategy: ensures that valuable and sensitive data is stored with proper backup, replication, and recreating capabilities.


System’s ability to handle work load as the number of users or requests increases.


  • Horizontal scaling(scaling out): scaling by adding more machines to your pool of resources.
  • Vertical scaling(scaling up): scaling by adding more resources to an existing machine

Factors for choosing scalability type:

  • Cost
  • Performance
  • Flexibility
  • Regularity of Upgrades
  • Redundancy
  • Geographical Distribution
  • Traffic Patterns
    • Diurnal Pattern: Traffic increases in the morning and decreases in the evening for a particular region.
    • Global / Regional: Regional Heavy usage of the application.
    • Thundering Herd: high burst of traffic.
      • peak time
      • densely populated areas


  • Replication: involves sharing information to ensure consistency between redundant resources to improve reliability, fault-tolerance, or accessibility.
  • Fault Tolerance: It is the property that enables a system to continue operating correctly in the event of the failure of one or more faults within some of its components.
  • Archivability: The ability to archive data.

Architectural Structure


Extensibility measures the ability to extend a system and the effort required to implement the extension. The extension can be through adding new functionality or modifying existing functionality.

  • Modular / Reusability
  • Pluggability


  • Accessibility
  • Learnability
  • API Contract


The ability to handle and recover from accidental and malicious failures.

Recoverability: The preparatory processes and functionality enable you to return your services to an initial functioning state after an unintended change.

Disaster recovery (DR): practices designed to prevent or minimize data loss and business disruption resulting from catastrophic events.

Design Patterns

  • Bulkhead: Isolate elements of an application into pools so that if one fails, the others will continue to function.
  • Circuit Breaker: Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource.
  • Leader Election: Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.


It’s a measure of speed of modification in software development.

  • Maintainability: The ability to modify the software to improve it, correct it, or adapt it to changes in environment and requirements.
    • Testability
    • Ease of development
  • Deployability: The time to get code into production after committing the deployment time.
    • Installability
    • Upgradeability
    • Portability
  • Configurability
  • Compatibility

Architectural Behaviour


Consistency guarantees that every read returns the most recent write. This means that after the execution of every operation, the data is consistent across all the nodes, and thus all clients see the same data at the same time, no matter which node they connect to. Consistency improves the data freshness.


Observability is the ability to collect data about program execution, internal states of modules, and communication between components. To improve observability, use various logging and tracing techniques and tools.

  • Logging
  • Alerts & Monitoring
  • 3 layer support process(L1, L2, L3): L1 interacts with customers, L2 manages the tickets routed to them by L1, and L3 is the last line of support and usually comprises a development team which addresses the technical issues.


  • Auditability
  • Legality
    • Compliance
    • Privacy
    • Certification
  • Authentication
  • Authorisation