If someone holds /proc/drbd open, previously rmmod would
"succeed" in starting the unload, but then block on remove_proc_entry,
leading to a situation where the lsmod does not show drbd anymore,
but /proc/drbd being still there (but no longer accessible).
I'd rather have rmmod fail up front in this case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Given low-enough network bandwidth combined with a IO
pattern that hammers onto a single RS-extent, side-stepping
might be necessary for much longer times.
Changed the code to print a single informal message after
20 seconds, but it keeps on stepping aside forever.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This patch is acutally a necessary addendum to the patch
"fix for spurious full sync (becoming sync target looked like invalidate)"
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets
written to disk. Locking out of app-IO is done by using the
drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days.
It is no longer necessary to lock out app-IO based on the connection
state.
App-IO that may come in after the BITMAP_IO flag got cleared before the
state transition to C_SYNC_(SOURCE|TARGET) does not get mirrored, sets
a bit in the local bitmap, that is already set, therefore changes nothing.
* C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets).
With that we make sure they have the same number of bits when going
into the C_SYNC_(SOURCE|TARGET) connection state.
* C_UNCONNECTED: The receiver starts, no need to lock out IO.
* C_DISCONNECTING: in drbd_disconnect() we had a wait_event()
to wait until ap_bio_cnt reaches 0. Removed that.
* C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE
C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING
* C_WF_REPORT_PARAMS: IO still possible since that is still
like C_WF_CONNECTION.
And we do not need to send barriers in C_WF_BITMAP_S connection state.
Allow concurrent accesses to the bitmap when receiving the bitmap.
Everything gets ORed anyways.
A drbd_free_tl_hash() is in after_state_chg_work(). At that point
all the work items of the last connections must have been processed.
Introduced a call to drbd_free_tl_hash() into drbd_free_mdev()
for paranoia reasons.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The relevant change is that the state change to C_FW_BITMAP_S should
implicitly change pdsk to C_CONSISTENT. (Think of it as C_OUTDATED, only
without the guarantee that the peer has the outdated written to its
meta data)
At that opportunity I restructured the switch statement so that it
gets evaluated every time. (Has declarative character)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
May only test for ap_bio_cnt == 0 under req_lock. It can increase
only under req_lock.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The condition must be checked after perpare_to_wait(). The old
implementaion could loose wakeup events. Never observed in real
life.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Since inc_ap_bio() might sleep already
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Before:
drbd_rs_begin_io() locked app-IO out of an RS extent, and
waited then until all previous app-IO in that area finished.
(But not only until the disk-IO was finished but until the
barrier/epoch ack came in for that == round trip time latency ++)
After:
As soon as a new app-IO waits wants to start new IO on that
RS extent, drbd_rs_begin_io() steps aside (clearing the
BME_NO_WRITES flag again). It retries after 100ms.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We only issue resync requests if there is no significant application IO
going on. = Application IO has higher priority than resnyc IO.
If application IO can not be started because the resync process locked
an resync_lru entry, start the IO operations necessary to release the
lock ASAP.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This one should be replaced with moving this cleanup to the
'right' position.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In this connection mode, the ahead node no longer replicates
application IO. The behind's disk becomes out dated.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With commit
drbd: further converge progress display of resync and online-verify
accidentally an u64/u64 div was introduced, causing an unresolvable
symbol __udivdi3 to be reference. Actually for that division, 32bit are
still suficient for now, so we can revert to unsigned long instead.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
To ease tracking of bios in some hash tables, we want it to
not cross certain boundaries (128k, used to be 32k).
We limit the maximum bio size using queue parameters.
Historically some defines and variables we use there have been named
max_segment_size, which was misguided. Rename them to max_bio_size,
and use [blk_]queue_max_hw_sectors where appropriate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We used to be limited to 32k requests,
but have increased that limit to 128k now.
This part of the code can only deal with 32k,
it would scramble arbitrary pages for larger requests.
As it is used for debugging only anyways,
it is ok to simply truncate the dumped data here.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With data-integrity digest enabled, double-check on the sending side
for modifications by upper layers of buffers under write back,
so we can tell it appart from corruption on the "wire".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Show progressbar and ETA always, with proc_details >= 1 also show the
current sector position for both resync and online-verify on both nodes.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When converting bits (4k resolution, still) to kB, we shift left. If it
was a large number of bits on a 32bit box (>= 4 TiB storage), we may
wrap the 32bit unsigned long base type, resulting in incorrect display.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is in preparation to unify progress reporting of
online-verify and resync requests.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For partial (resumed) online verify, initialize the resync step marks
once we know what the online verify start sector is.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For a partial (resumed) online-verify, initialize rs_total not to total
bits, but to number of bits to check in this run, to match the meaning
rs_total has for actual resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For network hickups during online-verify, on the next verify
triggered, we by default want to resume where it left off.
After any replication link interruption, there will be a (possibly
empty) resync. Do not reset online-verify start sector if some resync
completed, that would defeats the purpose.
Only reset the start sector once a verify run is completed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Convert from ->media_changed() to ->check_events().
pktcdvd needs to forward all event related operations to the
underlying device. Forward ->check_events() instead of
->media_changed() and inherit disk->[async_]events.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Peter Osterlund <petero2@telia.com>
umem doesn't implement media changed detection and there's no need to
implement dummy callback anymore. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Convert from ->media_changed() to ->check_events().
xsysace buffers media changed state and clears it on revalidation. It
will behave correctly with kernel event polling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Convert from ->media_changed() to ->check_events().
ub buffers media changed state and clears it on revalidation. It will
behave correctly with kernel event polling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Convert from ->media_changed() to ->check_events().
Both swim and swim3 buffer media changed state and clear it on
revalidation. They will behave correctly with kernel event polling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Laurent Vivier <laurent@lvivier.info>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Convert from ->media_changed() to ->check_events().
DAC960 media change notification seems to be one way (once set, never
cleared) and will generate spurious events when polled once the
condition triggers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Convert paride drivers from ->media_changed() to ->check_events().
pcd and pd buffer and clear events after reporting; however, pf
unconditionally reports MEDIA_CHANGE and will generate spurious events
when polled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Tim Waugh <tim@cyberelk.net>
Convert the floppy drivers from ->media_changed() to ->check_events().
Both floppy and ataflop buffer media changed state bit and clear them
on revalidation and will behave correctly with kernel event polling.
I can't tell how amiflop clears its event and it's possible that it
may generate spurious events when polled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Prepare for extending the block device ring to allow request
specific fields, by moving the request specific fields for
reads, writes and barrier requests to a union member.
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Owen Smith <owen.smith@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.
The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.
* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.
* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.
Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.
Signed-off-by: Tejun Heo <tj@kernel.org>
Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: James Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>