Sooos uses a form of IPC similar to URPC (User RPC) and attempts to be fully asynchronous , non locking , deadlock free and non blocking which is not easy...
Lockless
Non locking can be achieved by point to point ring buffers , the solution is obvious for 2 asyncronous user level applications talking to each other . The main difference to URPC is the use of an indirect tail pointer this prevents the same cache line being accessed by sender and receiver ( and cross CPU hw messages) . If we limit the queue to 64K we can scan 8 -16 queues (XMM and YMM) with a single poll to see if there are any changes , making for a cheap poll . This is important since it is likely some services will have hundreds or thousands of queues .
Enhancements / key features here
- Queues collapse or expand according to use
- Active queues are polled , but inactive queues are inserted in a list
- Lazy scehduling for idle tasks they remain on the run list at low priority until the yave received no message for 2 polls
- Indirect tail pointer as mentioned
- The system makes full use of SSE 128 or 256 bit move instructions
- Non temporal moves for large messages and messages constructed from register directly into the buffer to prevent cache polution.
- Receiver peek . So can process without copying.
Note we dont use interlock which can place a HOLD on the system memory bus . On some architectures
Non Blocking
The major issues is back pressure eg consider a task that reads a file from disk then writes it via a another task it is possible that the entire file will be in memory before it gets written . A better example is in a typical Synch progam all the printf are buffered and then block on screen IO and as anyone who has done benchmarking knows the time for these blocks is significant. In an asycnhronous system those printf calls simply become cheap asyc calls and the program continues.
The problem is we want to allow as much work to be done without blocking but to not generate huge memory pressure. This will assist the scheduler and Service Manager in terms of schduling and spawning more servers to handle hot spots.
several mechanisms are used to help this these include..
- Grant prioriy scheduling
- System Buffer allocation ( especially IO) based on priority
- Queue overflow into work queues , which fires events the client can use to block , yield or sleep.
Grant Priority Scheduling
We adjust scheduling priorities based on the queue size , eg a task that has a queue to another process that hits the high water mark all the time , will have its priority reduced and the priority will be added to the receiving task . In this case the writer will get higher priority. In addition on a single core it will yield after adjusting priorities..
Note the receiver may spawn more Processes ( STP) to handle the work but this is done by the Service Manager and independent of scheduling . It will be based on total work from all queues.
System Buffer allocation ( especially IO) based on priority
A worse problem exists in the case of generating large amounts of data and writing it to a slow IO process ag a file system , even with the lowest priority it is likely that large amounts of memory will be used possibly consuming all memory ( though if paging is used it will eventually block on the file system).
To handle this we use buffers similar to fbufs for IO , These buffers are out of a global system pool and the amount a process can allocate out of the pool is adjusted by its priority . The amount of buffers allocated can be increased via a capability if system policy requires a single task doing huge amounts of IO. When a buffer canot be allocated the caller will block. Hence we will block a task that generate huge amounts of memory pressure due to IO but allow task that do small amounts frequently to never block.
Its important to know that services that create buffers on behalf of a task will use the capability of the original requester. They may also create their own buffers eg a config file but this agains their own quota.
Note while we could send 1K packets through IPC relatively efficiently the IO buffers allow better back pressure maangement.
We envisage seperate IO scheduling later but at present it will slave of the system priority.
Queue overflow into work queue
The system has a work queue ( also used for Asynch events for some languages like OCaml) where messages are placed when an IPC queue is full , this is read by itself and fires events allows the app to handle back pressure . Common options
- Sleep the thread when reading an entry from the queue.
- Sleep with an exponential after every n size growth in the work queue .
Deadlock free ( low deadlock count)
We dont allow waiting for a specific message this prevents many dead locks. This creates some issues with latency for important messages for single core systems but are mostly mitigated by yielding . A yield allows the remaining time slice to be yielded to an important receiver .
It is worth noting the above system will have few possible dead locks in the IPC system ..The only possible blocks are in waiting for IO buffers and related to interactions with the scheduler
We can reduce most of the buffer ones by never blocking the buffer manager and block IO drivers since these rarely if ever allocate blocks for themselves they shouldnt block in the first place .
We also reduce the chance of dead locks around scheduling since the IPC is mostly divorced from scheduling.
It is worth restating we are talking about IPC deadlocking programs can still grab locks and deadlock themselves though the IPC architecture should make the frequency of these locks much much lower.