Commit Graph

5 Commits

Author SHA1 Message Date
Paolo Valente
b72b02b26f block/bfq: Improve and refactor throughput-boosting logic
When a queue associated with a process remains empty, there are cases
where throughput gets boosted if the device is idled to await the
arrival of a new I/O request for that queue. Currently, BFQ assumes
that one of these cases is when the device has no internal queueing
(regardless of the properties of the I/O being served). Unfortunately,
this condition has proved to be too general. So, this commit refines it
as "the device has no internal queueing and is rotational".

This refinement provides a significant throughput boost with random
I/O, on flash-based storage without internal queueing. For example, on
a HiKey board, throughput increases by up to 125%, growing, e.g., from
6.9MB/s to 15.6MB/s with two or three random readers in parallel.

This commit also refactors the code related to device idling, for the
following reason. Finding the change that provides the above large
improvement has been slightly more difficult than it had to be,
because the logic that decides whether to idle the device is still
scattered across three functions. Almost all of the logic is in the
function bfq_bfqq_may_idle, but (1) part of the decision is made in
bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
switch off idling regardless of the output of bfq_bfqq_may_idle. In
addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
their decisions as a function of parameters that are used, for similar
purposes, also in bfq_bfqq_may_idle. This commit addresses this issue
by moving all the logic into bfq_bfqq_may_idle.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2020-04-08 11:17:15 +02:00
Paolo Valente
72c098e1e1 block/bfq: Consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.

To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.

Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.

Hitting the above corner case causes crashes.

To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs

Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2020-04-08 11:17:15 +02:00
Paolo Valente
afb148019a block/bfq: Update BFQ to v8r12
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2020-04-08 11:17:12 +02:00
Mauro Andreolini
11cb842942 block, bfq: add Early Queue Merge (EQM) to BFQ-v7r11 for 4.4.0
A set of processes may happen  to  perform interleaved reads, i.e.,requests
whose union would give rise to a  sequential read  pattern.  There are two
typical  cases: in the first  case,   processes  read  fixed-size chunks of
data at a fixed distance from each other, while in the second case processes
may read variable-size chunks at  variable distances. The latter case occurs
for  example with  QEMU, which  splits the  I/O generated  by the  guest into
multiple chunks,  and lets these chunks  be served by a  pool of cooperating
processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
available  process. CFQ  uses actual  queue merging  for the  first type of
rocesses, whereas it  uses preemption to get a sequential  read pattern out
of the read requests  performed by the second type of  processes. In the end
it uses  two different  mechanisms to  achieve the  same goal: boosting the
throughput with interleaved I/O.

This patch introduces  Early Queue Merge (EQM), a unified mechanism to get a
sequential  read pattern  with both  types of  processes. The  main idea is
checking newly arrived requests against the next request of the active queue
both in case of actual request insert and in case of request merge. By doing
so, both the types of processes can be handled by just merging their queues.
EQM is  then simpler and  more compact than the  pair of mechanisms used in
CFQ.

Finally, EQM  also preserves the  typical low-latency properties of BFQ, by
properly restoring the weight-raising state of a queue when it gets back to
a non-merged state.

Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Luca Grifo <lg@linux.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2020-04-08 11:17:11 +02:00
Paolo Valente
f7efdfa441 block: introduce the BFQ-v7r11 I/O sched for 4.4.0
The general structure is borrowed from CFQ, as much of the code for
handling I/O contexts. Over time, several useful features have been
ported from CFQ as well (details in the changelog in README.BFQ). A
(bfq_)queue is associated to each task doing I/O on a device, and each
time a scheduling decision has to be made a queue is selected and served
until it expires.

    - Slices are given in the service domain: tasks are assigned
      budgets, measured in number of sectors. Once got the disk, a task
      must however consume its assigned budget within a configurable
      maximum time (by default, the maximum possible value of the
      budgets is automatically computed to comply with this timeout).
      This allows the desired latency vs "throughput boosting" tradeoff
      to be set.

    - Budgets are scheduled according to a variant of WF2Q+, implemented
      using an augmented rb-tree to take eligibility into account while
      preserving an O(log N) overall complexity.

    - A low-latency tunable is provided; if enabled, both interactive
      and soft real-time applications are guaranteed a very low latency.

    - Latency guarantees are preserved also in the presence of NCQ.

    - Also with flash-based devices, a high throughput is achieved
      while still preserving latency guarantees.

    - BFQ features Early Queue Merge (EQM), a sort of fusion of the
      cooperating-queue-merging and the preemption mechanisms present
      in CFQ. EQM is in fact a unified mechanism that tries to get a
      sequential read pattern, and hence a high throughput, with any
      set of processes performing interleaved I/O over a contiguous
      sequence of sectors.

    - BFQ supports full hierarchical scheduling, exporting a cgroups
      interface.  Since each node has a full scheduler, each group can
      be assigned its own weight.

    - If the cgroups interface is not used, only I/O priorities can be
      assigned to processes, with ioprio values mapped to weights
      with the relation weight = IOPRIO_BE_NR - ioprio.

    - ioprio classes are served in strict priority order, i.e., lower
      priority queues are not served as long as there are higher
      priority queues.  Among queues in the same class the bandwidth is
      distributed in proportion to the weight of each queue. A very
      thin extra bandwidth is however guaranteed to the Idle class, to
      prevent it from starving.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
Signed-off-by: Luca Grifo <lg@linux.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2020-04-08 11:17:11 +02:00