Reverse proxy cache server Squid, Nginx, CDN deployment explanation

Table of Contents

  • 1. Squid reverse proxy
    • 1.1 Concept
    • 1.2 Working Mechanism
    • 1.3 Build
  • 2. Nginx reverse proxy cache
  • 3. CDN
    • 3.1 CDN is a concept
    • 3.2 Advantages of CDN
    • 3.3 Related technologies of CDN
    • 3.3.1 Load balancing technology
      • 3.3.2 Dynamic Content Distribution and Replication Technology
      • 3.3.3 Caching Technology
    • CDN working process

1. Squid reverse proxy

1.1 Concept

If the requested resource is cached in the Squid reverse proxy server, the requested resource will be returned directly to the client; otherwise, the reverse proxy server will request the resource from the background Web server, then return the response to the request to the client, and cache the response locally for use by the next requester.

1.2 Working mechanism

  • Cache web page objects to reduce repeated requests
  • Rotate Internet requests or assign weights to intranet Web servers
  • Proxy user requests to prevent users from directly accessing the web server and improve security

1.3 Construction

vim /etc/squid.conf
?…
--60 lines--modify, insert
http_port 192.168.80.10:80 accel vhost vport
cache_peer 192.168.80.11 parent 80 0 no-query originserver round-robin max_conn=30 weight=1 name=web1
cache_peer 192.168.80.12 parent 80 0 no-query originserver round-robin max_conn=30 weight=1 name=web2
cache_peer_domain web1 web2 www.kgc.com
#Represents a request for www.kgc.com, squid sends a request to port 80 of 192.168.80.11 and 192.168.80.12

-------------------------------------------------------------------------------------------------------------
http_port 80 accel vhost vport #squid has changed from a cache to a reverse proxy acceleration mode of a Web server. At this time, Squid listens for requests on port 80 and is bound to the request port (vhost vport) of the web server. At this time, the request arrives at Squid. Squid does not need to forward the request, but directly either fetches data from the cache or directly requests data from the bound port.
accel: reverse proxy acceleration mode
vhost : supports domain name or host name to represent agent node
vport: supports IP and port to represent proxy nodes

parent : Represents the parent node, upper-lower relationship, non-horizontal relationship
80: Proxy port 80 of the internal web server
0: no icp (telecom operator), means only one squid server
no-query : Do not perform query operations, and directly obtain data
originserver : specify the source server
round-robin: Specifies that squid distributes requests to one of the parent nodes by polling
max_conn : specifies the maximum number of connections
weight : specify the weight
name : Set an alias
-------------------------------------------------------------------------------------------------------------

//Clear the iptables rules configured in transparent mode before
iptables -F
iptables -t nat -F

systemctl stop httpd #Prevent port 80 used by the httpd service from conflicting with the listening port configured by the squid reverse proxy
systemctl restart squid


#Backend node server settings
yum install -y httpd
systemctl start httpd

#Node 1:
echo "this is test01" >> /var/www/html/index.html
#Node 2:
echo "this is test02" >> /var/www/html/index.html


#Client domain name mapping configuration
Modify the C:\Windows\System32\drivers\etc\hosts file
192.168.80.10 www.kgc.com

The browser does not open the proxy to access http://www.kgc.com

View cache hits
tailf /usr/local/squid/var/logs/access.log
1631164427.547 0 192.168.80.200 TCP_MEM_HIT/200 381 GET http://www.kgc.com/-HIER_NONE/-text/html


2. Nginx reverse proxy cache

http {<!-- -->
    proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;
\t
#######################################################
●path: Mandatory parameter, specify the storage path of the cache file.
● levels: defines the level of the cache directory. Each layer can be represented by 1 (up to 16 choices, 0-f) or 2 (up to 256 choices, 00-ff), separated by : .
proxy_cache_path /data/nginx/cache; means that there is only one directory for all caches, such as /data/nginx/cache/d7b6e5978e3f042f52e875005925e51b
proxy_cache_path /data/nginx/cache levels=1:2; means that the cache is a second-level directory (with 16*256=4096 directories), such as /data/nginx/cache/b/51/d7b6e5978e3f042f52e875005925e51b
Keys_zone: Mandatory parameter, which defines the name and size of the shared memory zone. The shared memory is used to save the metadata of cached items (all active keys and information related to cached data), so that nginx can quickly judge whether a request hits or misses the cache. 1m can store 8000 keys, and 10m can store 80000 keys.
●inactive: delete cache files that have not been accessed within the specified time, the default is 10 minutes.
●max_size: Set the upper limit of cache storage, if not specified, all disk space will be used up.
●use_temp_path: Put temporary files directly in the cache directory.
#######################################################
    
    upstream cache_server{<!-- -->
        server 192.168.80.20:80;
        server 192.168.80.30:80;
    }
    
    server {<!-- -->
        listen 80;
        server_name www.kgc.com;
        location / {<!-- -->
            proxy_cache my_cache; #Specify shared memory for page cache, zone name is defined by proxy_cache_path directive
            proxy_cache_valid 200 5m; #Set different cache times for different response status codes, this is a request with a cache status code of 200, and the cache time is 5 minutes
            proxy_cache_key $request_uri; #Specify the key of the cache file as the requested URI
            add_header Nginx-Cache-Status $upstream_cache_status #Set the cache status as header information and respond to the client
            proxy_pass http://cache_server; #Set the protocol and address of the backend server forwarded by the proxy
        }
    }
}



#For some pages or data with very high real-time requirements, you should not set the cache. Let's see how to configure the content that is not cached.
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;
server {<!-- -->
  listen 80;
  server_name cache.lion.club;
  The setting variable value of #URI with suffix .txt or .text is "no cache"
  if ($request_uri ~ \.(txt|text)$) {<!-- -->
   set $cache_name "no cache"
  }
  
  location / {<!-- -->
    proxy_no_cache $cache_name; #Determine whether the variable has a value, if there is a value, it will not be cached, if there is no value, it will be cached
    proxy_cache my_cache; #Set cache memory
    proxy_cache_valid 200 5m; #The cache status is 200 requests, and the cache duration is 5 minutes
    proxy_cache_key $request_uri; #The key of the cache file is the requested URI
    add_header Nginx-Cache-Status $upstream_cache_status #Set the cache status as header information and respond to the client
    proxy_pass http://cache_server; #proxy forwarding
  }
}


Restart the service and use the client to test and access the nginx reverse proxy cache server

3. CDN

3.1 CDN is a concept

The full name of CDN is Content Delivery Network, namely Content Distribution Network. Its purpose is to add a new layer of CACHE (caching) layer to the existing Internet, and publish the content of the website to the node closest to the user’s network “edge”, so that the user can obtain the required content nearby (proximity principle), and improve the response speed of the user’s access to the website. Technically solve the reasons such as small network bandwidth, large number of user visits, uneven distribution of outlets, etc., and improve the response speed of users visiting the website.

3.2 Advantages of CDN

  • The CDN node solves the problem of cross-operator and cross-regional access, and the access delay is greatly reduced;
  • Most requests are completed at the edge nodes of the CDN, and the CDN plays a role in offloading, reducing the load on the origin site.

3.3 CDN related technologies

The implementation of CDN depends on the support of various network technologies, among which load balancing technology, dynamic content distribution and replication technology, and caching technology are the main ones. Let’s take a brief look at these technologies below.

3.3.1 Load balancing technology

In CDN, ** load balancing is divided into server load balancing and server overall load balancing (some are also called server global load balancing). **Server load balancing refers to the ability to allocate tasks among servers with different performance, which can not only ensure that the server with poor performance will not become the bottleneck of the system, but also ensure that the resources of the server with high performance can be fully utilized. Server-wide load balancing allows web hosts, portals, and businesses to distribute content and services based on geographic location. Improve fault tolerance and availability by using multi-site content and services to protect against failures caused by local or regional network outages, power outages, or natural disasters. In the CDN solution, the overall server load balancing will play an important role, and its performance will directly affect the performance of the entire CDN.

3.3.2 Dynamic content distribution and replication technology

Everyone knows that the response speed of website access depends on many factors, such as whether there is a bottleneck in the bandwidth of the network, whether there is congestion and delay in the route during transmission, the processing capacity of the website server and the access distance, etc. In most cases, website response speed is closely related to the distance between the visitor and the website server. If the distance between the visitor and the website is too far, the communication between them also needs to go through heavy routing, forwarding and processing, and network delays are inevitable. An effective method is to use content distribution and replication technology to distribute and replicate most of the static web pages, images, and streaming media data that account for the main body of the website to acceleration nodes in various places. Therefore, dynamic content distribution and replication technology is also a major technology required by CDN.

3.3.3 Caching technology

Caching technology is not a new technology. Web caching services improve user response time in several ways, such as proxy caching services, transparent proxy caching services, transparent proxy caching services using redirection services, and so on. Through the Web cache service, users can minimize the traffic of the WAN when accessing web pages. For corporate intranet users, this means caching content locally without retrieving pages across a dedicated WAN. For Internet users, this means storing content in their ISP’s cache without retrieving web pages over the Internet. This will undoubtedly increase the user’s access speed. The core function of CDN is to improve the access speed of the network, so caching technology will be another main technology adopted by CDN.

CDN working process

  1. After the user enters the URL and press Enter, after the local DNS system resolves, the DNS will hand over the final domain name resolution right to the CDN dedicated DNS server pointed to by the CNAME.

  2. The DNS server of the CDN returns the IP address of the global load balancing device of the CDN to the browser

  3. The user initiates a content url request to the global load balancing server of the CDN

  4. The CDN global load balancing server selects a load balancing device in the area to which the user belongs according to the IP address, url and other information requested by the user, and tells the user to initiate a request to this device.

  5. The CDN regional load balancing server will select a suitable cache server to provide services for the user. The selection basis is mainly: the distance from the user is close, whether there is content required by the user on the cache server, and the current load balancing situation of each cache. Select an optimal cache server ip address.

  6. The global load balancing server will give the IP address of the cache server to the user.

  7. The user initiates a request to the cache server. The cache server responds to the user request and transmits the content required by the user to the user terminal. If the cache server does not have the content that the user wants, then this server will request the content from its upper-level cache server, until it can be traced back to the source server of the website, and the content will be pulled locally.

Detailed flow chart: