- extra checks related to ioprioi-class changes
- specific check on st->idle in __bfq_requeue_entity
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
A set of processes may happen to perform interleaved reads, i.e.,requests
whose union would give rise to a sequential read pattern. There are two
typical cases: in the first case, processes read fixed-size chunks of
data at a fixed distance from each other, while in the second case processes
may read variable-size chunks at variable distances. The latter case occurs
for example with QEMU, which splits the I/O generated by the guest into
multiple chunks, and lets these chunks be served by a pool of cooperating
processes, iteratively assigning the next chunk of I/O to the first
available process. CFQ uses actual queue merging for the first type of
rocesses, whereas it uses preemption to get a sequential read pattern out
of the read requests performed by the second type of processes. In the end
it uses two different mechanisms to achieve the same goal: boosting the
throughput with interleaved I/O.
This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
sequential read pattern with both types of processes. The main idea is
checking newly arrived requests against the next request of the active queue
both in case of actual request insert and in case of request merge. By doing
so, both the types of processes can be handled by just merging their queues.
EQM is then simpler and more compact than the pair of mechanisms used in
CFQ.
Finally, EQM also preserves the typical low-latency properties of BFQ, by
properly restoring the weight-raising state of a queue when it gets back to
a non-merged state.
Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Luca Grifo <lg@linux.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
The general structure is borrowed from CFQ, as much of the code for
handling I/O contexts. Over time, several useful features have been
ported from CFQ as well (details in the changelog in README.BFQ). A
(bfq_)queue is associated to each task doing I/O on a device, and each
time a scheduling decision has to be made a queue is selected and served
until it expires.
- Slices are given in the service domain: tasks are assigned
budgets, measured in number of sectors. Once got the disk, a task
must however consume its assigned budget within a configurable
maximum time (by default, the maximum possible value of the
budgets is automatically computed to comply with this timeout).
This allows the desired latency vs "throughput boosting" tradeoff
to be set.
- Budgets are scheduled according to a variant of WF2Q+, implemented
using an augmented rb-tree to take eligibility into account while
preserving an O(log N) overall complexity.
- A low-latency tunable is provided; if enabled, both interactive
and soft real-time applications are guaranteed a very low latency.
- Latency guarantees are preserved also in the presence of NCQ.
- Also with flash-based devices, a high throughput is achieved
while still preserving latency guarantees.
- BFQ features Early Queue Merge (EQM), a sort of fusion of the
cooperating-queue-merging and the preemption mechanisms present
in CFQ. EQM is in fact a unified mechanism that tries to get a
sequential read pattern, and hence a high throughput, with any
set of processes performing interleaved I/O over a contiguous
sequence of sectors.
- BFQ supports full hierarchical scheduling, exporting a cgroups
interface. Since each node has a full scheduler, each group can
be assigned its own weight.
- If the cgroups interface is not used, only I/O priorities can be
assigned to processes, with ioprio values mapped to weights
with the relation weight = IOPRIO_BE_NR - ioprio.
- ioprio classes are served in strict priority order, i.e., lower
priority queues are not served as long as there are higher
priority queues. Among queues in the same class the bandwidth is
distributed in proportion to the weight of each queue. A very
thin extra bandwidth is however guaranteed to the Idle class, to
prevent it from starving.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
Signed-off-by: Luca Grifo <lg@linux.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
Update Kconfig.iosched and do the related Makefile changes to include
kernel configuration options for BFQ. Also increase the number of
policies supported by the blkio controller so that BFQ can add its
own.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
Signed-off-by: Luca Grifo <lg@linux.com>
Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
[ Upstream commit 1a67356e9a4829da2935dd338630a550c59c8489 ]
It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.
So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.
For exmaple (run this on an architecture with 64k pages):
Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock
This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2c0fcd28772f99236d261509bcd242135677965 upstream.
These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.
Cc: <stable@vger.kernel.org> # v4.4+
Fixes: bbd3e064362e ("block: add an API for Persistent Reservations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fold in followup patch from Arnd with missing pr.h header include.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit d2c9be89f8ebe7ebcc97676ac40f8dec1cf9b43a upstream.
8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.
Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8962842ca5abdcf98e22ab3b2b45a103f0408b95 upstream.
It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.
Use snprintf to avoid the potential buffer overflow.
This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.
Cc: stable@vger.kernel.org
Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 229b53c9bf4e1132a4aa6feb9632a7a1f1d08c5c ]
all other drivers recognizing those ioctls are very much *not*
biarch.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
When bio_add_pc_page() fails in bio_copy_user_iov() we should free
the page we just allocated otherwise we are leaking it.
Cc: linux-block@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>