Il posizionamento di un server httpd davanti a un'applicazione Vert.x impedisce lo scopo di Vert.x?

0

Sto guardando un discorso su Vert.x (non in inglese) e dicono che "l'intera architettura deve essere asincrona dall'inizio alla fine", e cita come esempio di cosa non fare mettendo un httpd di Apache server davanti all'applicazione Vert.x, quindi spiega che l'utilizzo di Nginx sarebbe un'idea migliore.

Comprendo che Nginx segue un'architettura basata su eventi, asincrona e single-threaded come Vert.x, al contrario del modello sincrono di un thread-per-processo di Apache, ma non vedo come questo sia un problema che riguarda l'applicazione Vert.x dietro di esso.

Httpd creerà un thread per richiesta e inoltrerà tale richiesta a Vert.x. Sono indipendenti e fintanto che httpd riesce a tenere il passo con il traffico in entrata, qual è il problema del funzionamento interno?

    
posta garci560 20.11.2017 - 22:39
fonte

1 risposta

1

Il problema C10K è il problema:

Apache creates processes and threads to handle additional connections. The administrator can configure the server to control the maximum number of allowable processes. This configuration varies depending on the available memory on the machine. Too many processes exhaust memory and can cause the machine to swap memory to disk, severely degrading performance. Plus, when the limit of processes is reached, Apache refuses additional connections.

The limiting factor in tuning Apache is memory and the potential to dead-locked threads that are contending for the same CPU and memory. If a thread is stopped, the user waits for the web page to appear, until the process makes it free, so it can send back the page. If a thread is deadlocked, it does not know how to restart, thus remaining stuck.

Nginx does not create new processes for each web request, instead the administrator configures how many worker processes to create for the main Nginx process. (One rule of thumb is to have one worker process for each CPU.) Each of these processes is single-threaded. Each worker can handle thousands of concurrent connections. It does this asynchronously with one thread, rather than using multi-threaded programming.

Distinguere tra prestazioni e scalabilità è la chiave:

The Apache problem is the more connections the worse the performance.

Key insight: performance and scalability are orthogonal concepts. They don't mean the same thing. When people talk about scale they often are talking about performance, but there’s a difference between scale and performance. As we'll see with Apache.

With short term connections that last a few seconds, say a quick transaction, if you are executing a 1000 TPS then you’ll only have about a 1000 concurrent connections to the server.

Change the length of the transactions to 10 seconds, now at 1000 TPS you’ll have 10K connections open. Apache’s performance drops off a cliff though which opens you to DoS attacks. Just do a lot of downloads and Apache falls over.

If you are handling 5,000 connections per second and you want to handle 10K, what do you do? Let’s say you upgrade hardware and double it the processor speed. What happens? You get double the performance but you don’t get double the scale. The scale may only go to 6K connections per second. Same thing happens if you keep on doubling. 16x the performance is great but you still haven’t got to 10K connections. Performance is not the same as scalability.

The problem was Apache would fork a CGI process and then kill it. This didn’t scale.

Why? Servers could not handle 10K concurrent connections because of O(n^2) algorithms used in the kernel.

Two basic problems in the kernel:

Connection = thread/process. As a packet came in it would walk down all 10K processes in the kernel to figure out which thread should handle the packet

Connections = select/poll (single thread). Same scalability problem. Each packet had to walk a list of sockets.

Solution: fix the kernel to make lookups in constant time

Threads now constant time context switch regardless of number of threads.

Came with a new scalable epoll()/IOCompletionPort constant time socket lookup.

Thread scheduling still didn't scale so servers scaled using epoll with sockets which led to the asynchronous programming model embodied in Node and Nginx. This shifted software to a different performance graph. Even with a slower server when you add more connections the performance doesn't drop off a cliff. At 10K connections a laptop is even faster than a 16 core server.

Riferimenti

risposta data 20.09.2018 - 14:31
fonte

Leggi altre domande sui tag