Build-Time Options#
TooManyCooks will work out-of-the-box without any configuration.
However, there are several options that may be specified at build time that are recommended for best performance.
They mostly take the form of a preprocessor definition that should be defined globally for your application.
The best place to do this is in your build script (by passing -DTMC_FLAG_NAME
to the compiler).
For an example of a build script that contains all of these configurations,
see the tmc-examples CMakeLists.txt.
These configurations are listed roughly in order by the amount of performance impact they will have. The most important options are listed first.
Recommended Build-Time Options
Link to tcmalloc / mimalloc / jemalloc#
Since each new coroutine requires an allocation, they are sensitive to allocator performance. Any of tcmalloc, mimalloc, or jemalloc provide greatly superior performance to the default system allocator. For synthetic benchmarks on my machines, tcmalloc appears to perform the best, but you should benchmark in your own application.
This doesn’t require a TMC compilation flag; you only need to link your application to one of these libraries to replace your global allocator.
TMC_USE_HWLOC#
Enable TMC to use the Portable Hardware Locality (hwloc) library
to optimize thread layout and work-stealing groups of tmc::ex_cpu
according to your processor architecture. This will yield noticeable
improvements on systems with non-uniform cache architecture, such as modern AMD Ryzen/Epyc or Intel Hybrid Core architectures.
Default Behavior (off)
tmc::ex_cpu
will create threads according to std::thread::hardware_concurrency()
and these threads will not be assigned to any particular core. All threads will be part of the same work-stealing group.
TMC_USE_HWLOC (on)
tmc::ex_cpu
will create 1 thread per (logical) core reported by hwloc. These threads will be organized
into work-stealing groups by cores that share L3 cache. Threads will be pinned to core groups that share L3 cache
(not to a specific core, but to any core in that L3 cache group). Threads will prefer to steal work from other threads in their
L3 cache group before checking other groups for work. You must make <hwloc.h>
available on your include path
and link libhwloc
to your application.
Other Build-Time Options
TMC_PRIORITY_COUNT#
Allows you to specify the number of priority levels at compile time. This allows certain runtime checks to be optimized away.
Default Behavior
The number of priority levels must be specified at runtime with tmc::ex_cpu::set_priority_count()
. If unspecified, the default priority count is 1.
TMC_PRIORITY_COUNT=1
All operations will run at the same priority, and some priority checking and tracking logic is completely removed from the code.
TMC_PRIORITY_COUNT=<N between 2 and 16>
Observable behavior will be the same as if you called tmc::ex_cpu::set_priority_count()
but the compiler is able to inline / unroll certain checks.
TMC_TRIVIAL_TASK#
By default, tasks are rvalue-only awaitables / linear types, like most other TMC awaitables; they must be passed around by move operations
and then consumed by awaiting them exactly once. This prevents accidental memory leaks or use-after-free issues. However, since
tmc::task
is the size of a single pointer, the linear type checks (move constructor and destructor)
prevent optimizations that could occur if it were a trivial type (such as passing it in a register).
Enabling this flag is provided to allow you to disable the linear type checks in order to improve performance.
Enabling will not change any behaviors within TMC; it simply replaces the the copy+move constructor and destructor of tmc::task
with defaulted ones.
In doing so, it removes the guardrails that would alert you if you have violated
the linear type rules (and leaked a coroutine or awaited it twice). If you are going to enable this, it is encouraged to
do so only in final release builds. You should always build and run your application at least once without this.
Default Behavior (off)
tmc::task
is a move-only type. There is an assert in the destructor that checks that the task was executed to completion.
TMC_TRIVIAL_TASK (on)
tmc::task
is a trivial type. It can be freely copied and has no runtime checks.
TMC_WORK_ITEM#
Controls the type used to submit work items to TMC executors and store them in their internal work queues. This type alias is known as tmc::work_item
and
is not directly exposed in most cases, as TMC public APIs are templated to transform inputs into this type internally.
Any of these types can store either a coroutine or a functor, but the performance characteristics are different.
Default Behavior (TMC_WORK_ITEM=CORO)
tmc::work_item
is an alias for std::coroutine_handle<>
. This type is 8 bytes in size.
Coroutines can be stored directly in it, but functors will be wrapped in a coroutine trampoline.
This option yields the best performance for coroutines and the worst performance for functors.
TMC_WORK_ITEM=FUNCORO
tmc::work_item
is an alias for tmc::coro_functor
. This type is 16 bytes in size.
This is a custom type provided by this library that can directly store both coroutines and functors.
It provides excellent performance for both coroutines and functors.
It supports move-only functors, and has pointer/lvalue/rvalue reference constructor overloads that make the ownership semantics clear.
It does not support small-object optimization.
TMC_WORK_ITEM=FUNC
tmc::work_item
is an alias for std::function<void()>
. This type is 32 bytes in size on most systems.
Both coroutines and functors can be stored directly in it.
This option yields the best performance for functors, if able to make use of small-object optimization, and the worst performance for coroutines.
This type doesn’t support move-only functors and always makes a copy of your functor, which may block certain use cases.
Because this type requires its parameter to be copyable, it also requires you to define TMC_TRIVIAL_TASK.
-
class coro_functor#
Public Functions
-
inline void operator()() const noexcept#
Resumes the provided coroutine, or calls the provided function/functor.
-
inline bool is_coroutine() noexcept#
Returns true if this was constructed with a coroutine type.
-
inline std::coroutine_handle as_coroutine() noexcept#
Returns the pointer as a coroutine handle. This is only valid if this was constructed with a coroutine type.
as_coroutine()
will not convert a regular function into a coroutine.
-
template<typename T>
inline coro_functor(T &&CoroutineHandle) noexcept# Coroutine handle constructor.
-
inline coro_functor(void (*FreeFunction)()) noexcept#
Free function void() constructor.
-
template<typename T>
inline coro_functor(T *Functor) noexcept# Pointer to function object constructor. The caller must manage the lifetime of the parameter and ensure that the pointer remains valid until operator() is called.
-
template<typename T>
inline coro_functor(const T &Functor) noexcept# Lvalue function object constructor. Copies the parameter into a new allocation owned by the coro_functor.
-
template<typename T>
inline coro_functor(T &&Functor) noexcept Rvalue function object constructor. Moves the parameter into a new allocation owned by the coro_functor.
-
inline coro_functor() noexcept#
Default constructor is provided for use with data structures that initialize the passed-in type by reference.
-
inline coro_functor(const coro_functor &Other) = delete#
Not copy-constructible. Holds an owning pointer to the functor object.
-
inline void operator()() const noexcept#