10 Key Factors of Software System Scalability

As part of the principles of sound software design, this post will focus on scalability-one of the most critical elements for building robust, future-proof applications.

Insert a message, recently opened a video account, less than 1000 followers, unable to apply for certification, colleagues who see it use your little hands that don’t write bugs to follow, sincere thanks.

In today’s world of ever-increasing data and users, software needs to be ready for higher loads. Ignoring scalability is like building a beautiful house on weak foundations – it might look great at first, but eventually it will crumble under the stress.

Whether you’re building enterprise systems, mobile apps, or even something for personal use, how do you ensure your software can handle growth? The scalable system provides a great user experience even during traffic spikes and high usage. Applications that don’t scale are frustrating at best, becoming unusable at worst or completely crashing under increased load.

In this post, we’ll explore 10 key areas for designing highly scalable architectures. By mastering these concepts, you can develop software that can be deployed at scale without costly rework. Your users will thank you for building apps that keep them happy today and tomorrow (when your user base grows 10x).

Horizontal and vertical scaling

Horizontal scaling and vertical scaling

One of the first key concepts of scalability is understanding the difference between horizontal scaling and vertical scaling. Horizontal scaling means increasing capacity by adding more machines or nodes to the system. For example, adding more servers to support the application’s increased traffic.

Vertical scaling involves increasing the capacity of existing nodes, such as upgrading to a server with a faster CPU, more RAM, or increased storage capacity.

In general, horizontal scaling is preferred because it provides greater reliability through redundancy. If one node fails, other nodes can take over the workload. Horizontal scaling also provides greater flexibility to scale incrementally as needed. With vertical scaling, you need to fully upgrade your hardware to handle the increased load.

However, vertical scaling can be useful when specific tasks, such as CPU-intensive data processing, require increased computing power. In general, scalable architectures employ a combination of vertical and horizontal scaling approaches to adjust system resource requirements over time.

Load Balancing

Once you scale horizontally by adding servers, you need a way to distribute requests and traffic evenly across these nodes. This is where load balancing comes in. A load balancer sits in front of the servers and efficiently routes incoming requests to the servers.

This prevents any single server from becoming overwhelmed. A load balancer can implement different algorithms such as round robin, least connections, or IP hashing to determine how to distribute the load. More advanced load balancers can detect server health and adaptively divert traffic away from failed nodes.

Load balancing maximizes resource utilization and improves performance. It also provides high availability and reliability. If a server fails, the load balancer redirects traffic to the remaining online servers. This redundancy enables your system to adapt to the failure of a single server.

Implementing load balancing along with autoscaling allows your system to scale smoothly and easily. Your application can easily handle large traffic changes without running into capacity issues.

Database extension

As application usage grows, the database backing the system can become a bottleneck. There are various techniques to scale a database to meet high read/write loads. However, the database is one of the most difficult components to scale in most systems.

Database selection

Choosing the right database is critical to effectively scaling your database system. It depends on a variety of factors, including the type of data being stored and expected query patterns. Different types of data, such as metrics data, logs, enterprise data, graph data, and key/value stores, have different characteristics and requirements that require tailored database solutions.

For metrics data, high write throughput is critical for recording time-series data, and a time-series database like InfluxDB or Prometheus may be a better fit because of their optimized storage and query mechanisms. On the other hand, to handle large volumes of unstructured data such as logs, NoSQL databases such as Elasticsearch can provide efficient indexing and searching capabilities.

For enterprise data that requires strict ACID (Atomicity, Consistency, Isolation, Durability) transactions and complex relational queries, a traditional SQL database like PostgreSQL or MySQL may be the right choice. In contrast, for scenarios that require simple read and write operations, key/value stores such as Redis or Cassandra can provide low-latency data access.

Before selecting a database, the specific requirements of the application and its data characteristics must be thoroughly evaluated. Sometimes a combination of databases (polyglot persistence) may be the most effective strategy, using different databases for different parts of the application based on their strengths. Ultimately, the right database choice can significantly affect the scalability, performance, and overall success of a system.

Vertical scaling

The increased load can be temporarily alleviated simply by devoting more resources (such as CPU, memory, and storage) to a single database server. You should try it out before delving into the advanced concepts of scaling your database. Plus, vertical scaling simplifies your database stack.

However, there is a physical upper limit to how much a single server can scale. Additionally, a single database remains a single point of failure-if the hardened server fails, access to the data also fails.

This is why it is critical to employ horizontal scaling techniques in addition to vertical scaling of database server hardware.

Copy

Replication provides redundancy and improves performance by replicating data across multiple database instances. Writes to the leader node will be replicated to the read replica. Reads can be served from replicas, reducing the load on the master. Additionally, replication replicates data across redundant servers, eliminating the risk of a single point of failure.

Shards

Sharding divides the database into multiple smaller servers, allowing you to fluidly add more nodes as needed.

Sharding or partitioning involves splitting a database into multiple smaller databases by specific criteria, such as customer ID or geographic region. This allows you to scale horizontally by adding more database servers.

Additionally, you should focus on other areas that can help scale your database:

Schema denormalization involves duplication of data in the database to reduce the need for complex joins in queries, thereby improving query performance.
Caching frequently accessed data in a fast in-memory cache can reduce database queries. A cache hit avoids fetching data from a slower database.

Asynchronous processing

Synchronous request-response cycles can create bottlenecks that hinder scalability, especially for long-running or IO-intensive tasks. Asynchronous processing queues work to be processed in the background, releasing resources immediately for use by other requests.

For example, submitting a video transcoding job could directly block web requests, negatively impacting the user experience. Instead, transcoding tasks can be posted to a queue and processed asynchronously. Users get an immediate response, while transcoding tasks are handled separately.

Asynchronous video upload and transcoding example

Asynchronous tasks can be executed concurrently by background workers that scale horizontally across multiple servers. The queue size can be monitored to dynamically add more workers. Evenly distributed loads prevent any single worker from being overwhelmed.

Shifting workloads from synchronous to asynchronous enables applications to handle traffic spikes without bogging down. The system uses powerful queue-based asynchronous processing to remain responsive under load.

Stateless systems

Stateless systems are easier to scale horizontally than stateful designs. When application state is kept in external storage such as a database or distributed cache rather than on a local server, new instances can be started as needed.

In contrast, stateful systems require sticky sessions or data replication across instances. Stateless applications do not depend on a specific server. Requests can be routed to any available resource.

Saving state externally also provides better fault tolerance. The loss of any stateless application server will have no impact as it holds no critical data that is not persisted. Other servers can seamlessly take over processing.

Stateless architecture improves reliability and scalability. Resources can be elastically scaled while remaining decoupled from individual instances. However, external state storage adds overhead to caching or database lookups. These tradeoffs need to be carefully evaluated when designing web-scale applications.

Cache

Caching frequently accessed data in fast in-memory storage is a powerful technique for optimizing scalability. By serving read requests from a low-latency cache, you can significantly reduce the load on your backend database and improve performance.

For example, product catalog information that rarely changes is a good candidate for caching. Subsequent product page requests can fetch data from Redis or Memcached without overloading the MySQL storage. Cache invalidation strategies help keep data consistent.

Caching is also beneficial for computationally intensive processes such as template rendering. You can cache rendered output and bypass redundant rendering on every request. CDNs like Cloudflare cache and serve static resources like images, CSS, and JS globally.

Redis Golang example:

package main

import (
 "database/sql"
 "encoding/json"
 "fmt"
 "log"
 "net/http"
 "time"

 "github.com/go-redis/redis"
 _ "github.com/go-sql-driver/mysql"
)

const (
 dbUser = "your_mysql_username"
 dbPassword = "your_mysql_password"
 dbName = "your_mysql_dbname"
 redisAddr = "localhost:6379"
)

type Product struct {
 ID int `json:"id"`
 Name string `json:"name"`
 Price int `json:"price"`
}

var db *sql.DB
var redisClient *redis.Client

func init() {
 // Initialize MySQL connection
 dbSource := fmt.Sprintf("%s:%s@/%s", dbUser, dbPassword, dbName)
 var err error
 db, err = sql. Open("mysql", dbSource)
 if err != nil {
  log.Fatalf("Error opening database: %s", err)
 }

 // Initialize Redis client
 redisClient = redis.NewClient( &redis.Options{
  Addr: redis Addr,
  Password: "", // No password set
  DB: 0, // Use default DB
 })

 //Test the Redis connection
 _, err = redisClient.Ping().Result()
 if err != nil {
  log.Fatalf("Error connecting to Redis: %s", err)
 }

 log.Println("Connected to MySQL and Redis")
}

func getProductFromMySQL(id int) (*Product, error) {
 query := "SELECT id, name, price FROM products WHERE id = ?"
 row := db.QueryRow(query, id)
 var product Product
 err := row.Scan( &product.ID, &product.Name, &product.Price)
 if err != nil {
  return nil, err
 }
 return & product, nil
}

func getProductFromCache(id int) (*Product, error) {
 productJSON, err := redisClient.Get(fmt.Sprintf("product:%d", id)).Result()
 if err == redis. Nil {
  // Cache miss
  return nil, nil
 } else if err != nil {
  return nil, err
 }

 var product Product
 err = json.Unmarshal([]byte(productJSON), & amp;product)
 if err != nil {
  return nil, err
 }

 return & product, nil
}

func cacheProduct(product *Product) error {
 productJSON, err := json. Marshal(product)
 if err != nil {
  return err
 }

 key := fmt.Sprintf("product:%d", product.ID)
 return redisClient.Set(key, productJSON, 10*time.Minute).Err()
}

func getProductHandler(w http.ResponseWriter, r *http.Request) {
 productID := 1 // For simplicity, we are assuming product ID 1 here. You can pass it as a query parameter.

 // Try getting the product from the cache first
 cachedProduct, err := getProductFromCache(productID)
 if err != nil {
  http.Error(w, "Failed to retrieve product from cache", http.StatusInternalServerError)
  return
 }

 if cachedProduct == nil {
  // Cache miss, get the product from MySQL
  product, err := getProductFromMySQL(productID)
  if err != nil {
   http.Error(w, "Failed to retrieve product from database", http.StatusInternalServerError)
   return
  }

  if product == nil {
   http.Error(w, "Product not found", http.StatusNotFound)
   return
  }

  // Cache the product for future requests
  err = cacheProduct(product)
  if err != nil {
   log.Printf("Failed to cache product: %s", err)
  }

  // Respond with the product details
  json.NewEncoder(w).Encode(product)
 } else {
  // Cache hit, respond with the cached product details
  json.NewEncoder(w).Encode(cachedProduct)
 }
}

func main() {
 http.HandleFunc("/product", getProductHandler)
 log.Fatal(http.ListenAndServe(":8080", nil))
}

Using caching strategically can reduce the strain on your infrastructure and allow you to scale out as you add more caching servers. Caching is best suited for read-intensive workloads with repeated access patterns. It provides scalability gains along with database sharding and asynchronous processing.

Network bandwidth optimization

For distributed architectures spread across multiple servers and regions, optimizing network bandwidth utilization is key to scalability. Network calls can become a bottleneck, imposing limits on throughput and latency.

Bandwidth optimization techniques such as compression and caching reduce the number of network hops and the amount of data transferred. Compressing API and database responses minimizes bandwidth requirements.

Persistent connections over HTTP/2 allow multiple requests to be made over an open channel. This reduces round-trip overhead, improves resource utilization, and avoids HTTP head-of-line blocking. However, HTTP/2 still suffers from TCP head-of-line blocking. So instead of TCP and TLS, we can now even use HTTP/3 done via QUIC, and it avoids TCP head-of-line blocking.

CDN distribution brings data closer to users by caching assets at edge locations. By delivering content from nearby, less data travels over expensive long-distance lines.

Gzip Go language example:

package main

import (
 "github.com/labstack/echo/v4"
 "github.com/labstack/echo/v4/middleware"
)

func main() {
 e := echo. New()

 //Middleware
 e. Use(middleware. Logger())
 e.Use(middleware.Recover())
 e.Use(middleware.Gzip()) // Add gzip compression middleware

 // Routes
 e. GET("/", helloHandler)

 // Start server
 e. Logger. Fatal(e. Start(":8080"))
}

func helloHandler(c echo.Context) error {
 return c. String(200, "Hello, Echo!")
}

Overall, scaling requires a holistic view that includes not only compute and storage, but also network connectivity. Optimizing bandwidth usage by minimizing hops, compression, caching, etc. is valuable in building large systems with high throughput and low latency.

Progressive enhancement

Progressive enhancement is a strategy that helps increase the scalability of web applications. The idea is to build core functionality first, then gradually enhance the experience across browsers and devices.

For example, you can develop a basic HTML/CSS website to ensure accessibility on any browser. You can then add advanced CSS and JavaScript to incrementally improve interaction on modern browsers with JS support.

Serving the basic HTML first provides a fast “time to interact” and works on all platforms. Enhancements are subsequently loaded to optimize the experience without blocking. This balanced approach leverages capabilities while expanding reach. For example, Qwik bakes this concept into the foundation of the framework.

Phased incremental enhancements also help with scalability. Simple pages require fewer resources and scale better. You can add more advanced functionality when you need it, rather than over-engineering for every possible use case ahead of time.

Overall, progressive enhancement allows web applications to scale efficiently from basic to advanced functionality based on device capabilities and user needs.

Graceful degradation

In contrast to progressive enhancement, graceful degradation involves starting with a high-level experience and scaling back functionality when constraints are detected. This allows applications to scale down gracefully when faced with resource constraints.

For example, a graphics-rich application might detect a low-power mobile device and adapt to downgrade advanced visuals to a more basic presentation. Alternatively, backend systems may limit non-essential operations during peak loads to maintain core functionality.

Graceful degradation preserves critical user workflows even under suboptimal conditions. Errors due to limitations such as bandwidth, device capabilities, or traffic spikes are minimized. The experience still works, rather than failing catastrophically.

Feature degradation is a valuable tool that should be incorporated and planned for during the initial development of product features. The ability to automatically or manually disable features is critical to keeping the system up and running in a variety of situations, such as system overload, migration, or unexpected performance issues.

When a system experiences high load or becomes overwhelmed by too much traffic, dynamically disabling non-critical functions can reduce stress and prevent complete service failures. This clever use of functional degradation ensures that core functionality remains operational and prevents cascading failures throughout the application.

Functional degradation can help maintain system stability during database migrations or updates. The complexity of the migration process can be reduced by temporarily disabling certain features, thereby minimizing the risk of data inconsistency or corruption. Once the migration is complete and validated, these features can be seamlessly reactivated.

Additionally, feature downgrades can be a useful mechanism in cases where critical bugs or security holes are found in a particular feature. Promptly turning off the affected functionality prevents any further damage while the problem is resolved, ensuring the integrity of the overall system.

Overall, incorporating functional degradation as part of product design and development strategies enables systems to gracefully handle challenging situations, enhance resilience, and maintain an uninterrupted user experience under adverse conditions.

Building graceful degradation mechanisms such as device detection, performance monitoring, and throttling can improve the resiliency of applications as they scale up or down. Resources can be dynamically adjusted to optimal levels based on real-time constraints and priorities.

Code extensibility

Scalability best practices focus primarily on infrastructure and architecture. But well-written and optimized code is also key to scaling. Even on robust infrastructure, sub-optimal code can hinder performance and resource utilization.

Tight loops, inefficient algorithms, and poor data structure access can bog down a server. Architectures like microservices increase parallelism, but also exacerbate these inefficiencies.

Code Analyzer helps identify hotspots and bottlenecks. Refactored code to scale better, optimizing CPU, memory and I/O resource usage. Distributing processing across threads can also improve the utilization of multi-core servers.

Non-scalable code example (thread-per-request):

Even on robust infrastructure, inefficient code can hinder scalability. For example, allocating a thread per request does not scale well – the server will run out of threads under high load.

Better approaches like asynchronous/event-driven programming and non-blocking I/O provide higher scalability. Node.js uses this model to efficiently handle many concurrent requests on a single thread.

Virtual threads or goroutines are also more scalable than thread pools. Virtual threads are lightweight and managed by the runtime. Examples include goroutines in Go and green threads in Python.

Hundreds of thousands of goroutines can run concurrently compared to the limited operating system threads. Automatically multiplex goroutines to real threads at runtime. This removes the thread lifetime overhead and resource constraints of the thread pool.

Despite the infrastructure, well-structured code that maximizes asynchronous processing, virtual threads, and minimizes overhead is critical for large-scale applications.

Java example of virtual threads per task:

import java.io.*;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class VirtualThreadServer {

    public static void main(String[] args) {
        final int portNumber = 8080;
        try {
            ServerSocket serverSocket = new ServerSocket(portNumber);
            System.out.println("Server started on port " + portNumber);

            ExecutorService executor = Executors. newVirtualThreadPerTaskExecutor();

            while (true) {
                // Wait for a client connection
                Socket clientSocket = serverSocket. accept();
                System.out.println("Client connected: " + clientSocket.getInetAddress());

                // Submit the request handling task to the virtual thread executor
                executor. submit(() -> handleRequest(clientSocket));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    static void handleRequest(Socket clientSocket) {
        try (
            BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket. getInputStream()));
            PrintWriter out = new PrintWriter(clientSocket. getOutputStream(), true)
        ) {
            // Read the request from the client
            String request = in. readLine();

            // Process the request (you can add your custom logic here)
            String response = "HTTP/1.1 200 OK\r\
Content-Type: text/html\r\
\r\
Hello, this is a virtual thread server!";

            // Send the response back to the client
            out. println(response);

            // Close the connection
            clientSocket. close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Note If you want to run the above code, make sure you have java 20 installed, copy the code to VirtualThreadServer.java and run it with java --source 20 --enable-preview VirtualThreadServer.java.

Just as infrastructure needs to scale, so does code. Efficient code ensures that the server performs optimally under load. Overloaded servers cripple scalability, regardless of the surrounding architecture. Optimize code and scale infrastructure for best results.

Conclusion

Scaling a software system to handle growth is critical to long-term success. We explore key technologies such as horizontal scaling, load balancing, database sharding, asynchronous processing, caching, and optimized code to design highly scalable architectures.

While scaling requires an ongoing effort, investing in scalability early on will prevent painful bottlenecks in the future. Think ahead about your capacity needs, not an afterthought. Build redundancy, monitor usage, scale incrementally, and distribute load across multiple nodes.

With a strong, responsive design, your software can continue to keep customers happy even if usage spikes by 10 or 100 times. Scaling will differentiate your application from the multitude of applications that grow and collapse. Despite increasing demands, as long as your platform remains equally fast, available, and reliable, your users will stay.