diff options
| author | Andy Wingo <wingo@pobox.com> | 2016-09-04 21:58:24 +0200 |
|---|---|---|
| committer | Andy Wingo <wingo@pobox.com> | 2016-09-04 21:58:24 +0200 |
| commit | 36ecf69d8002efb997a776e8479e81a315724341 (patch) | |
| tree | a066712b1e70675256b5c315eca0ed56e4acf7e0 | |
| parent | Add support for FD finalization (diff) | |
| download | guile-fibers-36ecf69d8002efb997a776e8479e81a315724341.tar.gz | |
Add fibers.texi
* fibers.texi: New file.
* Makefile.am: Add new file.
| -rw-r--r-- | Makefile.am | 2 | ||||
| -rw-r--r-- | fibers.texi | 299 |
2 files changed, 301 insertions, 0 deletions
diff --git a/Makefile.am b/Makefile.am index 9f33f38..7f74726 100644 --- a/Makefile.am +++ b/Makefile.am @@ -22,6 +22,8 @@ include build-aux/guile.am moddir=$(prefix)/share/guile/site/$(GUILE_EFFECTIVE_VERSION) godir=$(libdir)/guile/$(GUILE_EFFECTIVE_VERSION)/ccache +info_TEXINFOS=fibers.texi + SOURCES = \ fibers/epoll.scm \ fibers.scm \ diff --git a/fibers.texi b/fibers.texi new file mode 100644 index 0000000..5277d45 --- /dev/null +++ b/fibers.texi @@ -0,0 +1,299 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename fibers.info +@settitle Fibers +@c %**end of header + +@set VERSION 0.1 +@set UPDATED 17 July 2016 + +@copying +This manual is for Fibers (version @value{VERSION}, updated +@value{UPDATED}) + +Copyright 2016 Andy Wingo + +@quotation +@c For more information, see COPYING.docs in the fibers +@c distribution. +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@end quotation +@end copying + +@dircategory The Algorithmic Language Scheme +@direntry +* Fibers: (fibers.info). Lightweight concurrency for Guile. +@end direntry + +@titlepage +@title Fibers +@subtitle version @value{VERSION}, updated @value{UPDATED} +@author Andy Wingo +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@ifnottex +@node Top +@top Fibers + + +@insertcopying +@menu +* Introduction:: What's this all about? +@end menu + +@end ifnottex + +@iftex +@shortcontents +@end iftex + +@node Introduction +@chapter Introduction + +Fibers is a facility for lightweight concurrency in Guile. + +@section A brief history of language facilities for concurrency + +Modern machines have the raw capability to serve hundreds of thousands +of simultaneous long-lived connections, but it's often hard to manage +this at the software level. Fibers tries to solve this problem in a +nice way. Before discussing the approach taken in Fibers, it's worth +spending some time on history to see how we got here. + +One of the most dominant patterns for concurrency these days is +``callbacks'', notably in the Twisted library for Python and the +Node.js run-time for JavaScript. The basic observation in the +callback approach to concurrency is that the efficient way to handle +tens of thousands of connections at once is with low-level operating +system facilities like @code{poll} or @code{epoll}. You add all of +the file descriptors that you are interested in to a ``poll set'' and +then ask the operating system which ones are readable or writable, as +appropriate. Once the operating system says ``yes, file descriptor +7145 is readable'', you can do something with that socket; but what? +With callbacks, the answer is ``call a user-supplied closure'': a +callback, representing the continuation of the computation on that +socket. + +Building a network service with a callback-oriented concurrency system +means breaking the program into little chunks that can run without +blocking. Whereever a program could block, instead of just continuing +the program, you register a callback. Unfortunately this requirement +permeates the program, from top to bottom: you always pay the mental +cost of inverting your program's control flow by turning it into +callbacks, and you always incur run-time cost of closure creation, +even when the particular I/O could proceed without blocking. It's a +somewhat galling requirement, given that this contortion is required +of the programmer, but could be done by the compiler. We Schemers +demand better abstractions than manual, obligatory +continuation-passing-style conversion. + +Callback-based systems also encourage unstructured concurrency, as in +practice callbacks are not the only path for data and control flow in +a system: usually there is mutable global state as well. Without +strong patterns and conventions, callback-based systems often exhibit +bugs caused by concurrent reads and writes to global state. + +Some of the problems of callbacks can be mitigated by using +``promises'' or other library-level abstractions; if you're a Haskell +person, you can think of this as lifting all possibly-blocking +operations into a monad. If you're not a Haskeller, that's cool, +neither am I! But if your typey spidey senses are tingling, it's for +good reason: with promises, your whole program has to be transformed +to return promises-for-values instead of values anywhere it would +block. + +An obvious solution to the control-flow problem of callbacks is to use +threads. In the most generic sense, a thread is a language feature +which denotes an independent computation. Threads are created by +other threads, but fork off and run independently instead of returning +to their caller. In a system with threads, there is implicitly a +scheduler somewhere that multiplexes the threads so that when one +suspends, another can run. + +In practice, the concept of threads is often conflated with a +particular implementation, @dfn{kernel threads}. Kernel threads are +very low-level abstractions that are provided by the operating system. +The nice thing about kernel threads is that they can use any CPU that +is the kernel knows about. That's an important factor in today's +computing landscape, where Moore's law seems to have ended and we are +getting more cores instead of more gigahertz. + +However, as a building block for a highly concurrent system, kernel +threads have a few important problems. + +One is that kernel threads simply aren't designed to be allocated in +huge numbers, and instead are more optimized to run in a +one-per-CPU-core fashion. Their memory usage is relatively high for +what should be a lightweight abstraction: some 10 kilobytes at least +and often some megabytes, in the form of the thread's stack. There +are ongoing efforts to reduce this for some systems but we cannot +expect wide deployment in the next 5 years, if ever. Even in the best +case, a hundred thousand kernel threads will take at least a gigabyte +of memory, which seems a bit excessive for book-keeping overhead. + +Kernel threads can be a bit irritating to schedule, too: when one +thread suspends, it's for a reason, and it can be that user-space +knows a good next thread that should run. However because kernel +threads are scheduled in the kernel, it's rarely possible for the +kernel to make informed decisions. There are some ``user-mode +scheduling'' facilities that are in development for some systems, but +again only for some systems. + +The other significant problem is that building non-crashy systems on +top of kernel threads is hard to do, not to mention ``correct'' +systems. It's an embarrassing situation. For one thing, the +low-level synchronization primitives that are typically provided with +kernel threads, mutexes and condition variables, are not composable. +Also, as with callback-oriented concurrency, one thread can silently +corrupt another via unstructured mutation of shared state. It's worse +with kernel threads, though: a kernel thread can be interrupted at any +point, not just at I/O. And though callback-oriented systems can +theoretically operate on multiple CPUs at once, in practice they +don't. This restriction is sometimes touted as a benefit by +proponents of callback-oriented systems, because in such a system, the +callback invocations have a single, sequential order. With multiple +CPUs, this is not the case, as multiple threads can run at the same +time, in parallel. + +Kernel threads can work. The Java virtual machine does at least +manage to prevent low-level memory corruption and to do so with high +performance, but still, even Java-based systems that aim for maximum +concurrency avoid using a thread per connection because threads use +too much memory. + +In this context it's no wonder that there's a third strain of +concurrency: shared-nothing message-passing systems like Erlang. +Erlang isolates each thread (called @dfn{processes} in the Erlang +world), giving each it its own heap and ``mailbox''. Processes can +spawn other processes, and the concurrency primitive is +message-passing. A process that tries receive a message from an empty +mailbox will ``block'', from its perspective. In the meantime the +system will run other processes. Message sends never block, oddly; +instead, sending to a process with many messages pending makes it more +likely that Erlang will pre-empt the sending process. It's a strange +tradeoff, but it makes sense when you realize that Erlang was designed +for network transparency: the same message send/receive interface can +be used to send messages to processes on remote machines as well. + +No network is truly transparent, however. At the most basic level, +the performance of network sends should be much slower than local +sends. Whereas a message sent to a remote process has to be written +out byte-by-byte over the network, there is no need to copy immutable +data within the same address space. The complexity of a remote +message send is O(n) in the size of the message, whereas a local +immutable send is O(1). This suggests that hiding the different +complexities behind one operator is the wrong thing to do. And +indeed, given byte read and write operators over sockets, it's +possible to implement remote message send and receive as a process +that serializes and parses messages between a channel and a byte sink +or source. In this way we get cheap local channels, and network shims +are under the programmer's control. This is the approach that the Go +language takes, and is the one we use in Guile. + +Channels and message send/receive is a great start, but it's not +sufficient for all purposes. At the minimum, a program should be able +to wait for a message send or receive over multiple channels at once. +Go supports receive over multiple channels via its @code{select} +primitive, but there is a more general solution invented by Concurrent +ML, which is an abstraction over messages. In Concurrent ML, you can +get an event directly from a channel; actually synchronizing on the +reception of the event is a separate operation. Together with a +primitive to synchronize on one of a number of events, Concurrent ML +allows programs to build @code{select} in user-space. It looks like +the right direction for Guile to go too. + +@section Fibers design + +In Fibers, the unit of computation is the @dfn{fiber}, a lightweight +thread managed by Guile. A fiber communicates with the world via +normal Guile ports: @code{get-bytevector}, @code{put-string}, and all +that. Between themselves, fibers send and receive Scheme values over +@dfn{channels}, lightweight queues. + +Whenever a fiber tries to read but no data is available, or tries to +write but no data can be written, Guile will suspend the fiber and +arrange for it to be resumed when the port or channel operation can +proceed. In the meantime, Guile will run other fibers. When no fiber +is runnable, Guile will use efficient system facilities to sleep until +input or output can proceed. + +When a fiber would block, it suspends to the scheduler from the +current thread. The scheduler will arrange to re-start the fiber when +the port or channel becomes readable or writable, as appropriate. For +ports, the scheduler adds the file descriptor associated with the port +to an @code{epoll} set. In either case, the scheduler remembers which +fibers are waiting and for what, so that the user can inspect the +state of their system. + +If no scheduler has been installed in the current thread, the fiber +will... well, we don't know yet! Either it blocks its thread, or it +aborts. We don't know. + +On the Scheme level, a fiber is a delimited continuation. When a +scheduler runs a fiber, it does so within a prompt; when the fiber +suspends, it suspends to the prompt. The scheduler saves the +resulting continuation as part of the fiber's state. In this way the +per-fiber computational state overhead is just the size of the pending +stack frames of the fiber, which can be just a handful of words. + +Currently fibers are pinned to the kernel thread in which they are +created. We should probably implement some kind of work-stealing +behavior so that if you choose to devote multiple CPU cores to +servicing fibers, that they can share the workload. + +Ports are how fibers communicate with the world; channels are how +fibers communicate with each other. When a channel is created, it is +created with a maximum number of enqueued messages (by default one). +Sending to a channel with space in the queue will not suspend the +calling thread. A fiber that tries to write to a full queue or read +from a channel with an empty queue will add itself to a list of +``waiters'' for the channel and suspend itself to the scheduler. A +read from a full queue or a write to an empty queue will unblock +waiters on a channel. + +Unlike Erlang channels, channels in Fiber are purely local and do not +attempt to provide the illusion of network transparency. This does +have the positive advantage that we are able to provide better +backpressure support than Erlang, blocking when a channels's queue is +full instead of letting the sender keep sending many messages. + +On the other hand, currently fibers are not preemptively scheduled. A +fiber will only suspend when it would block on channel receive or +send, or on read or write on a port. This would be an OK point in the +design space if only one kernel thread could be running fibers at +once, as in Node.js. However given that this is not the case, Fibers +does not have many of the concurrency invariants that such systems +enjoy, so perhaps we should support preemption in the future. + +To avoid starvation, a fiber can only run once within a ``turn''. +Each turn starts with a poll on file descriptors of interest and marks +the associated fibers as runnable. If no fiber is runnable at the +start of the poll, the poll call will ask the kernel to wait for a +runnable descriptor. Otherwise the poll call will ask the kernel to +return immediately. There is an additional FD added to the poll set +that is used to interrupt a blocking poll, for example if a fiber +becomes runnable due to I/O on a channel from a separate kernel thread +while the first scheduler was still polling. + +To enable expressive cross-kernel-thread communications, channel sends +and receives are atomic and thread-safe. + +To start scheduling fibers, user code will typically create a +scheduler, instate it on the thread, add some fibers, then run the +scheduler. That call to run the scheduler will only return when there +there are no more fibers waiting to be scheduled. + +@c @node Concept Index +@c @unnumbered Concept Index +@c @printindex cp +@c @node Function Index +@c @unnumbered Function Index +@c @printindex fn + +@bye |
