Add ROW I/O Scheduler.

Squashed commit of the following: commit f49e14ccdcb6694ed27754e020057d27a8fcca07 Author: Andrei F <luxneb@gmail.com> Date: Thu Nov 26 22:40:38 2015 +0100 elevator: Fix a race in elevator switching commit d50235b7bc upstream. There's a race between elevator switching and normal io operation. Because the allocation of struct elevator_queue and struct elevator_data don't in a atomic operation.So there are have chance to use NULL ->elevator_data. For example: Thread A: Thread B blk_queu_bio elevator_switch spin_lock_irq(q->queue_block) elevator_alloc elv_merge elevator_init_fn Because call elevator_alloc, it can't hold queue_lock and the ->elevator_data is NULL.So at the same time, threadA call elv_merge and nedd some info of elevator_data.So the crash happened. Move the elevator_alloc into func elevator_init_fn, it make the operations in a atomic operation. Using the follow method can easy reproduce this bug 1:dd if=/dev/sdb of=/dev/null 2:while true;do echo noop > scheduler;echo deadline > scheduler;done The test method also use this method. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Jonghwan Choi <jhbird.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit daf22a727e64f1277b074442efb821366015ca72 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Jul 25 13:45:21 2013 +0300 block: row: Remove warning massage from add_request Regular priority queues is marked as "starved" if it skipped a dispatch due to being empty. When a new request is added to a "starved" queue it will be marked as urgent. The removed WARN_ON was warning about an impossible case when a regular priority (read) queue was marked as starved but wasn't empty. This is a possible case due to the bellow: If the device driver fetched a read request that is pending for transmission and an URGENT request arrives, the fetched read will be reinserted back to the scheduler. Its possible that the queue it will be reinserted to was marked as "starved" in the meanwhile due to being empty. CRs-fixed: 517800 Change-Id: Iaae642ea0ed9c817c41745b0e8ae2217cc684f0c Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit dca47e75f1413d58e4f97ef638e5d4456c55bdce Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Jul 2 14:43:13 2013 +0300 block: row: change hrtimer_cancel to hrtimer_try_to_cancel Calling hrtimer_cancel with interrupts disabled can result in a livelock. When flushing plug list in the block layer interrupts are disabled and an hrtimer is used when adding requests from that plug list to the scheduler. In this code flow, if the hrtimer (which is used for idling) is set, it's being canceled by calling hrtimer_cancel. hrtimer_cancel will perform the following in an endless loop: 1. try cancel the timer 2. if fails - rest_cpu the cancellation can fail if the timer function already started. Since interrupts are disabled it can never complete. This patch reduced the number of times the hrtimer lock is taken while interrupts are disabled by calling hrtimer_try_co_cancel. the later will try to cancel the timer just once and return with an error code if fails. CRs-fixed: 499887 Change-Id: I25f79c357426d72ad67c261ce7cb503ae97dc7b9 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit a6047b9d808eaa787e4df3107bea7536334856cd Author: Lee Susman <lsusman@codeaurora.org> Date: Sun Jun 23 16:27:40 2013 +0300 block: row-iosched idling triggered by readahead pages In the current implementation idling is triggered only by request insertion frequency. This heuristic is not very accurate and may hit random requests that shouldn't trigger idling. This patch uses the PG_readahead flag in struct page's flags, which indicates that the page is part of a readahead window, to start idling upon dispatch of a request associated with a readahead page. The above readehead flag is used together with the existing insertion-frequency trigger. The frequency timer will catch read requests which are not part of a readahead window, but are still part of a sequential stream (and therefore dispatched in small time intervals). Change-Id: Icb7145199c007408de3f267645ccb842e051fd00 Signed-off-by: Lee Susman <lsusman@codeaurora.org> commit e70e4e8e1d1f111023dd2b2d0fc9237240cab9ab Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Wed May 1 14:35:20 2013 +0300 block: urgent: Fix dispatching of URGENT mechanism There are cases when blk_peek_request is called not from blk_fetch_request thus the URGENT request may be started but the flag q->dispatched_urgent is not updated. Change-Id: I4fb588823f1b2949160cbd3907f4729767932e12 CRs-fixed: 471736 CRs-fixed: 473036 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 0e36870f6a436840eed1782d0e85b4adb300b59f Author: Maya Erez <merez@codeaurora.org> Date: Sun Apr 14 15:19:52 2013 +0300 block: row: Fix starvation tolerance values The current starvation tolerance values increase the boot time since high priority SW requests are delayed by regular priority requests. In order to overcome this, increase the starvation tolerance values. Change-Id: I9947fca9927cbd39a1d41d4bd87069df679d3103 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> commit 3cab8d28e735fdad300eda3bed703129ba05d70a Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Apr 11 14:57:15 2013 +0300 block: urgent request: Update dispatch_urgent in case of requeue/reinsert The block layer implements a mechanism for verifying that the device driver won't be notified of an URGENT request if there is already an URGENT request in flight. This is due to the fact that interrupting an URGENT request isn't efficient. This patch fixes the above described mechanism in case the URGENT request was returned back to the block layer from some reason: by requeue or reinsert. CRs-fixed: 473376, 473036, 471736 Change-Id: Ie8b8208230a302d4526068531616984825f1050d Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit e052e4574bb928b44e660b9679d23e14011b0b9d Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Mar 21 11:04:02 2013 +0200 block: row: Update sysfs functions All ROW (time related) configurable parameters are stored in ms so there is no need to convert from/to ms when reading/updating them via sysfs. Change-Id: Ib6a1de54140b5d25696743da944c076dd6fc02ae Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Conflicts: block/row-iosched.c commit 2c3203650c2109c18abb3b17a5114d54bb22e683 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Mar 21 13:02:07 2013 +0200 block: row: Prevent starvation of regular priority by high priority At the moment all REGULAR and LOW priority requests are starved as long as there are HIGH priority requests to dispatch. This patch prevents the above starvation by setting a starvation limit the REGULAR\LOW priority requests can tolerate. Change-Id: Ibe24207982c2c55d75c0b0230f67e013d1106017 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit a5434f618d395a03fe19ef430a8c5747bad069f9 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Mar 12 21:02:33 2013 +0200 block: urgent request: remove unnecessary urgent marking An urgent request is marked by the scheduler in rq->cmd_flags with the REQ_URGENT flag. There is no need to add an additional marking by the block layer. Change-Id: I05d5e9539d2f6c1bfa80240b0671db197a5d3b3f Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 3928fb74c2f78578c57913938644acb704b77586 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Mar 12 21:17:18 2013 +0200 block: row: Re-design urgent request notification mechanism When ROW scheduler reports to the block layer that there is an urgent request pending, the device driver may decide to stop the transmission of the current request in order to handle the urgent one. This is done in order to reduce the latency of an urgent request. For example: long WRITE may be stopped to handle an urgent READ. This patch updates the ROW URGENT notification policy to apply with the below: - Don't notify URGENT if there is an un-completed URGENT request in driver - After notifying that URGENT request is present, the next request dispatched is the URGENT one. - At every given moment only 1 request can be marked as URGENT. Independent of it's location (driver or scheduler) Other changes to URGENT policy: - Only READ queues are allowed to notify of an URGENT request pending. CR fix: If a pending urgent request (A) gets merged with another request (B) A is removed from scheduler queue but is not removed from rd->pending_urgent_rq. CRs-Fixed: 453712 Change-Id: I321e8cf58e12a05b82edd2a03f52fcce7bc9a900 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 8912aa92e3d919ceabc72b2eddc829fc5e4bd7eb Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Jan 24 16:17:27 2013 +0200 block: row: Update initial values of ROW data structures This patch sets the initial values of internal ROW parameters. Change-Id: I38132062a7fcbe2e58b9cc757e55caac64d013dc Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> [smuckle@codeaurora.org: ported from msm-3.7] Signed-off-by: Steve Muckle <smuckle@codeaurora.org> commit b709e1a8a56784cb83c2c31a4e7df574a6b29802 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Jan 24 15:08:40 2013 +0200 block: row: Don't notify URGENT if there are un-completed urgent req When ROW scheduler reports to the block layer that there is an urgent request pending, the device driver may decide to stop the transmission of the current request in order to handle the urgent one. If the current transmitted request is an urgent request - we don't want it to be stopped. Due to the above ROW scheduler won't notify of an urgent request if there are urgent requests in flight. Change-Id: I2fa186d911b908ec7611682b378b9cdc48637ac7 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit eba966603cc8e6f8fb418bf702f5a6eca5f56f34 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Jan 24 04:01:59 2013 +0200 block: add REQ_URGENT to request flags This patch adds a new flag to be used in cmd_flags field of struct request for marking request as urgent. Urgent request is the one that should be given priority currently handled (regular) request by the device driver. The decision of a request urgency is taken by the scheduler. Change-Id: Ic20470987ef23410f1d0324f96f00578f7df8717 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Conflicts: include/linux/blk_types.h commit 7c865ab1a9ae626d023d0b03ed7fbe5c57bcbe7c Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Jan 17 20:56:07 2013 +0200 block: row: Idling mechanism re-factoring At the moment idling in ROW is implemented by delayed work that uses jiffies granularity which is not very accurate. This patch replaces current idling mechanism implementation with hrtime API, which gives nanosecond resolution (instead of jiffies). Change-Id: I86c7b1776d035e1d81571894b300228c8b8f2d92 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 72ea1d39c04734bf5eb52117968704148d2da42f Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Wed Jan 23 17:15:49 2013 +0200 block: row: Dispatch requests according to their io-priority This patch implements "application-hints" which is a way the issuing application can notify the scheduler on the priority of its request. This is done by setting the io-priority of the request. This patch reuses an already existing mechanism of io-priorities developed for CFQ. Please refer to kernel/Documentation/block/ioprio.txt for usage example and explanations. Change-Id: I228ec8e52161b424242bb7bb133418dc8b73925a Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 9f8f3d2757788477656b1d25a3055ae11d97cee4 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Sat Jan 12 16:23:18 2013 +0200 block: row: Aggregate row_queue parameters to one structure Each ROW queues has several parameters which default values are defined in separate arrays. This patch aggregates all default values into one array. The values in question are: - is idling enabled for the queue - queue quantum - can the queue notify on urgent request Change-Id: I3821b0a042542295069b340406a16b1000873ec6 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit d84ad45f3077661cab5984cd2fb7d5ef2ff06e39 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Sat Jan 12 16:21:47 2013 +0200 block: row: fix sysfs functions - idle_time conversion idle_time was updated to be stored in msec instead of jiffies. So there is no need to convert the value when reading from user or displaying the value to him. Change-Id: I58e074b204e90a90536d32199ac668112966e9cf Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 202b21e9daf7b8a097f97f764bb4ad4712c75fa7 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Sat Jan 12 16:21:12 2013 +0200 block: row: Insert dispatch_quantum into struct row_queue There is really no point in keeping the dispatch quantum of a queue outside of it. By inserting it to the row_queue structure we spare extra level in accessing it. Change-Id: Ic77571818b643e71f9aafbb2ca93d0a92158b199 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 58ca84f091faa6ff8c4f567b158be5d38f9a5c58 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Sun Jan 13 22:04:59 2013 +0200 block: row: Add some debug information on ROW queues 1. Add a counter for number of requests on queue. 2. Add function to print queues status (number requests currently on queue and number of already dispatched requests in current dispatch cycle). Change-Id: I1e98b9ca33853e6e6a8ddc53240f6cd6981e6024 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 1bbb2c7ada5a647cab1f2306458d6cf9b821ddf7 Author: Subhash Jadavani <subhashj@codeaurora.org> Date: Thu Jan 10 02:15:13 2013 +0530 block: blk-merge: don't merge the pages with non-contiguous descriptors blk_rq_map_sg() function merges the physically contiguous pages to use same scatter-gather node without checking if their page descriptors are contiguous or not. Now when dma_map_sg() is called on the scatter gather list, it would take the base page pointer from each node (one by one) and iterates through all of the pages in same sg node by keep incrementing the base page pointer with the assumption that physically contiguous pages will have their page descriptor address contiguous which may not be true if SPARSEMEM config is enabled. So here we may end referring to invalid page descriptor. Following table shows the example of physically contiguous pages but their page descriptor addresses non-contiguous. ------------------------------------------- | Page Descriptor | Physical Address | ------------------------------------------ | 0xc1e43fdc | 0xdffff000 | | 0xc2052000 | 0xe0000000 | ------------------------------------------- With this patch, relevant blk-merge functions will also check if the physically contiguous pages are having page descriptors address contiguous or not? If not then, these pages are separated to be in different scatter-gather nodes. CRs-Fixed: 392141 Change-Id: I3601565e5569a69f06fb3af99061c4d4c23af241 Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org> Conflicts: block/blk-merge.c commit 9a9b428480c932ef8434d8b9bd3b7bafdcac3f84 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Dec 20 19:23:58 2012 +0200 row: Add support for urgent request handling This patch adds support for handling urgent requests. ROW queue can be marked as "urgent" so if it was un-served in last dispatch cycle and a request was added to it - it will trigger issuing an urgent-request-notification to the block device driver. The block device driver may choose at stop the transmission of current ongoing request to handle the urgent one. Foe example: long WRITE may be stopped to handle an urgent READ. This decreases READ latency. Change-Id: I84954c13f5e3b1b5caeadc9fe1f9aa21208cb35e Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 8d5ec526b7e70307d3c4ce587b714349f44c0be8 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Dec 6 13:17:19 2012 +0200 block:row: fix idling mechanism in ROW This patch addresses the following issues found in the ROW idling mechanism: 1. Fix the delay passed to queue_delayed_work (pass actual delay and not the time when to start the work) 2. Change the idle time and the idling-trigger frequency to be HZ dependent (instead of using msec_to_jiffies()) 3. Destroy idle_workqueue() in queue_exit Change-Id: If86513ad6b4be44fb7a860f29bd2127197d8d5bf Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Conflicts: block/row-iosched.c commit c26a95811462b9ba8eca23b4ba2150e7b660ca40 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Oct 30 08:33:06 2012 +0200 row: Adding support for reinsert already dispatched req Add support for reinserting already dispatched request back to the schedulers internal data structures. The request will be reinserted back to the queue (head) it was dispatched from as if it was never dispatched. Change-Id: I70954df300774409c25b5821465fb3aa33d8feb5 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit a1a6f09cae0149d935bcea3f20d4acb6556d68f9 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Dec 4 16:04:15 2012 +0200 block: Add API for urgent request handling This patch add support in block & elevator layers for handling urgent requests. The decision if a request is urgent or not is taken by the scheduler. Urgent request notification is passed to the underlying block device driver (eMMC for example). Block device driver may decide to interrupt the currently running low priority request to serve the new urgent request. By doing so READ latency is greatly reduced in read&write collision scenarios. Note that if the current scheduler doesn't implement the urgent request mechanism, this code path is never activated. Change-Id: I8aa74b9b45c0d3a2221bd4e82ea76eb4103e7cfa Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Conflicts: block/blk-core.c commit 4e907d9d6079629d6ce61fbdfb1a629d3587e176 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Tue Dec 4 15:54:43 2012 +0200 block: Add support for reinsert a dispatched req Add support for reinserting a dispatched request back to the scheduler's internal data structures. This capability is used by the device driver when it chooses to interrupt the current request transmission and execute another (more urgent) pending request. For example: interrupting long write in order to handle pending read. The device driver re-inserts the remaining write request back to the scheduler, to be rescheduled for transmission later on. Add API for verifying whether the current scheduler supports reinserting requests mechanism. If reinsert mechanism isn't supported by the scheduler, this code path will never be activated. Change-Id: I5c982a66b651ebf544aae60063ac8a340d79e67f Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit 0675c27faab797f7149893b84cc357aadb37c697 Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Mon Oct 15 20:56:02 2012 +0200 block: ROW: Fix forced dispatch This patch fixes forced dispatch in the ROW scheduling algorithm. When the dispatch function is called with the forced flag on, we can't delay the dispatch of the requests that are in scheduler queues. Thus, when dispatch is called with forced turned on, we need to cancel idling, or not to idle at all. Change-Id: I3aa0da33ad7b59c0731c696f1392b48525b52ddc Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> commit ce6acf59662d1bbe5663a64aef9fe1695b8bbe1b Author: Tatyana Brokhman <tlinder@codeaurora.org> Date: Thu Sep 20 10:46:10 2012 +0300 block: Adding ROW scheduling algorithm This patch adds the implementation of a new scheduling algorithm - ROW. The policy of this algorithm is to prioritize READ requests over WRITE as much as possible without starving the WRITE requests. Change-Id: I4ed52ea21d43b0e7c0769b2599779a3d3869c519 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org> Signed-off-by: Tkkg1994 <luca.grifo@outlook.com> Signed-off-by: djb77 <dwayne.bakewell@gmail.com>
2018-02-28 15:34:12 +01:00 · 2018-02-28 15:34:12 +01:00 · 7eb995fed9
commit 7eb995fed9
parent 8614176c18
10 changed files with 1403 additions and 4 deletions
--- a/Documentation/block/row-iosched.txt
+++ b/Documentation/block/row-iosched.txt
@ -0,0 +1,117 @@
+Introduction
+============
+
+The ROW scheduling algorithm will be used in mobile devices as default
+block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
+which is the main requests dispatch policy of this algorithm.
+
+The ROW IO scheduler was developed with the mobile devices needs in
+mind. In mobile devices we favor user experience upon everything else,
+thus we want to give READ IO requests as much priority as possible.
+The main idea of the ROW scheduling policy is:
+If there are READ requests in pipe - dispatch them but don't starve
+the WRITE requests too much.
+
+Software description
+====================
+The requests are kept in queues according to their priority. The
+dispatching of requests is done in a Round Robin manner with a
+different slice for each queue. The dispatch quantum for a specific
+queue is defined according to the queues priority. READ queues are
+given bigger dispatch quantum than the WRITE queues, within a dispatch
+cycle.
+
+At the moment there are 6 types of queues the requests are
+distributed to:
+-	High priority READ queue
+-	High priority Synchronous WRITE queue
+-	Regular priority READ queue
+-	Regular priority Synchronous WRITE queue
+-	Regular priority WRITE queue
+-	Low priority READ queue
+
+If in a certain dispatch cycle one of the queues was empty and didn't
+use its quantum that queue will be marked as "un-served". If we're in a
+middle of a dispatch cycle dispatching from queue Y and a request
+arrives for queue X that was un-served in the previous cycle, if X's
+priority is higher than Y's, queue X will be preempted in the favor of
+queue Y. This won't mean that cycle is restarted. The "dispatched"
+counter of queue X will remain unchanged. Once queue Y uses up it's quantum
+(or there will be no more requests left on it) we'll switch back to queue X
+and allow it to finish it's quantum.
+
+For READ requests queues we allow idling in within a dispatch quantum in
+order to give the application a chance to insert more requests. Idling
+means adding some extra time for serving a certain queue even if the
+queue is empty. The idling is enabled if we identify the application is
+inserting requests in a high frequency.
+
+For idling on READ queues we use timer mechanism. When the timer expires,
+if there are requests in the scheduler we will signal the underlying driver
+(for example the MMC driver) to fetch another request for dispatch.
+
+The ROW algorithm takes the scheduling policy one step further, making
+it a bit more "user-needs oriented", by allowing the application to
+hint on the urgency of its requests. For example: even among the READ
+requests several requests may be more urgent for completion then others.
+The former will go to the High priority READ queue, that is given the
+bigger dispatch quantum than any other queue.
+
+ROW scheduler will support special services for block devices that
+supports High Priority Requests. That is, the scheduler may inform the
+device upon urgent requests using new callback make_urgent_request.
+In addition it will support rescheduling of requests that were
+interrupted. For example, if the device issues a long write request and
+a sudden high priority read interrupt pops in, the scheduler will
+inform the device about the urgent request, so the device can stop the
+current write request and serve the high priority read request. In such
+a case the device may also send back to the scheduler the reminder of
+the interrupted write request, such that the scheduler may continue
+sending high priority requests without the need to interrupt the
+ongoing write again and again. The write remainder will be sent later on
+according to the scheduler policy.
+
+Design
+======
+Existing algorithms (cfq, deadline) sort the io requests according LBA.
+When deciding on the next request to dispatch they choose the closest
+request to the current disk head position (from handling last
+dispatched request). This is done in order to reduce the disk head
+movement to a minimum.
+We feel that this functionality isn't really needed in mobile devices.
+Usually applications that write/read large chunks of data insert the
+requests in already sorted LBA order. Thus dealing with sort trees adds
+unnecessary complexity.
+
+We're planing to try this enhancement in the future to check if the
+performance is influenced by it.
+
+SMP/multi-core
+==============
+At the moment the code is acceded from 2 contexts:
+- Application context (from block/elevator layer): adding the requests.
+- Underlying driver context (for example the mmc driver thread): dispatching
+  the requests and notifying on completion.
+
+One lock is used to synchronize between the two. This lock is provided
+by the underlying driver along with the dispatch queue.
+
+Config options
+==============
+1. hp_read_quantum: dispatch quantum for the high priority READ queue
+2. rp_read_quantum: dispatch quantum for the regular priority READ queue
+3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous
+   WRITE queue
+4. rp_swrite_quantum: dispatch quantum for the regular priority
+   Synchronous WRITE queue
+5. rp_write_quantum: dispatch quantum for the regular priority WRITE
+   queue
+6. lp_read_quantum: dispatch quantum for the low priority READ queue
+7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
+   WRITE queue
+8. read_idle: how long to idle on read queue in Msec (in case idling
+   is enabled on that queue).
+9. read_idle_freq: frequency of inserting READ requests that will
+   trigger idling. This is the time in Msec between inserting two READ
+   requests
+
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@ -94,6 +94,17 @@ config IOSCHED_ZEN
 	  FCFS, dispatches are back-inserted, deadlines ensure fairness.
 	  Should work best with devices where there is no travel delay.

+config IOSCHED_ROW
+	tristate "ROW I/O scheduler"
+	default n
+	---help---
+	  The ROW I/O scheduler gives priority to READ requests over the
+	  WRITE requests when dispatching, without starving WRITE requests.
+	  Requests are kept in priority queues. Dispatching is done in a RR
+	  manner when the dispatch quantum for each queue is calculated
+	  according to queue priority.
+	  Most suitable for mobile devices.
+
 choice
 	prompt "Default I/O scheduler"
 	default DEFAULT_CFQ
@ -131,6 +142,16 @@ choice
 	config DEFAULT_ZEN
 		bool "ZEN" if IOSCHED_ZEN=y

+	config DEFAULT_ROW
+		bool "ROW" if IOSCHED_ROW=y
+		help
+		  The ROW I/O scheduler gives priority to READ requests
+		  over the WRITE requests when dispatching, without starving
+		  WRITE requests. Requests are kept in priority queues.
+		  Dispatching is done in a RR manner when the dispatch quantum
+		  for each queue is defined according to queue priority.
+		  Most suitable for mobile devices.
+
 endchoice

 config DEFAULT_IOSCHED
@ -145,6 +166,7 @@ config DEFAULT_IOSCHED
 	default "tripndroid" if DEFAULT_TRIPNDROID
 	default "vr" if DEFAULT_VR
 	default "zen" if DEFAULT_ZEN
+	default "row" if DEFAULT_ROW

 endmenu

--- a/block/blk-core.c
+++ b/block/blk-core.c
@ -320,9 +320,20 @@ inline void __blk_run_queue_uncond(struct request_queue *q)
 	 * number of active request_fn invocations such that blk_drain_queue()
 	 * can wait until all these request_fn calls have finished.
 	 */
-	q->request_fn_active++;
-	q->request_fn(q);
-	q->request_fn_active--;
+
+	if (!q->notified_urgent &&
+		q->elevator->type->ops.elevator_is_urgent_fn &&
+		q->urgent_request_fn &&
+		q->elevator->type->ops.elevator_is_urgent_fn(q)) {
+		q->notified_urgent = true;
+		q->request_fn_active++;
+		q->urgent_request_fn(q);
+		q->request_fn_active--;
+	} else {
+		q->request_fn_active++;
+		q->request_fn(q);
+		q->request_fn_active--;
+	}
 }
 EXPORT_SYMBOL_GPL(__blk_run_queue_uncond);

@ -333,6 +344,12 @@ EXPORT_SYMBOL_GPL(__blk_run_queue_uncond);
 * Description:
 *    See @blk_run_queue. This variant must be called with the queue lock
 *    held and interrupts disabled.
+ *    Device driver will be notified of an urgent request
+ *    pending under the following conditions:
+ *    1. The driver and the current scheduler support urgent reques handling
+ *    2. There is an urgent request pending in the scheduler
+ *    3. There isn't already an urgent request in flight, meaning previously
+ *       notified urgent request completed (!q->notified_urgent)
 */
 void __blk_run_queue(struct request_queue *q)
 {
@ -1391,10 +1408,74 @@ void blk_requeue_request(struct request_queue *q, struct request *rq)

 	BUG_ON(blk_queued_rq(rq));

+	if (rq->cmd_flags & REQ_URGENT) {
+		/*
+		 * It's not compliant with the design to re-insert
+		 * urgent requests. We want to be able to track this
+		 * down.
+		 */
+		pr_err("%s(): requeueing an URGENT request", __func__);
+		WARN_ON(!q->dispatched_urgent);
+		q->dispatched_urgent = false;
+	}
 	elv_requeue_request(q, rq);
 }
 EXPORT_SYMBOL(blk_requeue_request);

+/**
+ * blk_reinsert_request() - Insert a request back to the scheduler
+ * @q:		request queue
+ * @rq:		request to be inserted
+ *
+ * This function inserts the request back to the scheduler as if
+ * it was never dispatched.
+ *
+ * Return: 0 on success, error code on fail
+ */
+int blk_reinsert_request(struct request_queue *q, struct request *rq)
+{
+	if (unlikely(!rq) || unlikely(!q))
+		return -EIO;
+
+	blk_delete_timer(rq);
+	blk_clear_rq_complete(rq);
+	trace_block_rq_requeue(q, rq);
+
+	if (rq->cmd_flags & REQ_QUEUED)
+		blk_queue_end_tag(q, rq);
+
+	BUG_ON(blk_queued_rq(rq));
+	if (rq->cmd_flags & REQ_URGENT) {
+		/*
+		 * It's not compliant with the design to re-insert
+		 * urgent requests. We want to be able to track this
+		 * down.
+		 */
+		pr_err("%s(): reinserting an URGENT request", __func__);
+		WARN_ON(!q->dispatched_urgent);
+		q->dispatched_urgent = false;
+	}
+
+	return elv_reinsert_request(q, rq);
+}
+EXPORT_SYMBOL(blk_reinsert_request);
+
+/**
+ * blk_reinsert_req_sup() - check whether the scheduler supports
+ *          reinsertion of requests
+ * @q:		request queue
+ *
+ * Returns true if the current scheduler supports reinserting
+ * request. False otherwise
+ */
+bool blk_reinsert_req_sup(struct request_queue *q)
+{
+	if (unlikely(!q))
+		return false;
+	return q->elevator->type->ops.elevator_reinsert_req_fn ? true : false;
+}
+EXPORT_SYMBOL(blk_reinsert_req_sup);
+
 static void add_acct_request(struct request_queue *q, struct request *rq,
 			     int where)
 {
@ -2415,6 +2496,10 @@ struct request *blk_peek_request(struct request_queue *q)
 			 * not be passed by new incoming requests
 			 */
 			rq->cmd_flags |= REQ_STARTED;
+			if (rq->cmd_flags & REQ_URGENT) {
+				WARN_ON(q->dispatched_urgent);
+				q->dispatched_urgent = true;
+			}
 			trace_block_rq_issue(q, rq);
 		}

--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@ -237,6 +237,9 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					goto new_segment;
 				if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
 					goto new_segment;
+				if ((bvprv.bv_page != bv.bv_page) &&
+				    (bvprv.bv_page + 1) != bv.bv_page)
+					goto new_segment;

 				seg_size += bv.bv_len;
 				bvprv = bv;
@ -348,6 +351,9 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 			goto new_segment;
 		if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
 			goto new_segment;
+		if (((bvprv)->bv_page != bvec->bv_page) &&
+			((bvprv->bv_page + 1) != bvec->bv_page))
+			goto new_segment;

 		(*sg)->length += nbytes;
 	} else {
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@ -77,6 +77,18 @@ void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn)
 }
 EXPORT_SYMBOL_GPL(blk_queue_lld_busy);

+/**
+ * blk_urgent_request() - Set an urgent_request handler function for queue
+ * @q:		queue
+ * @fn:		handler for urgent requests
+ *
+ */
+void blk_urgent_request(struct request_queue *q, request_fn_proc *fn)
+{
+	q->urgent_request_fn = fn;
+}
+EXPORT_SYMBOL(blk_urgent_request);
+
 /**
 * blk_set_default_limits - reset limits to default values
 * @lim:  the queue_limits structure to reset
--- a/block/elevator.c
+++ b/block/elevator.c
@ -575,6 +575,41 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
 	__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
 }

+/**
+ * elv_reinsert_request() - Insert a request back to the scheduler
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * This function returns the request back to the scheduler to be
+ * inserted as if it was never dispatched
+ *
+ * Return: 0 on success, error code on failure
+ */
+int elv_reinsert_request(struct request_queue *q, struct request *rq)
+{
+	int res;
+
+	if (!q->elevator->type->ops.elevator_reinsert_req_fn)
+		return -EPERM;
+
+	res = q->elevator->type->ops.elevator_reinsert_req_fn(q, rq);
+	if (!res) {
+		/*
+		 * it already went through dequeue, we need to decrement the
+		 * in_flight count again
+		 */
+		if (blk_account_rq(rq)) {
+			q->in_flight[rq_is_sync(rq)]--;
+			if (rq->cmd_flags & REQ_SORTED)
+				elv_deactivate_rq(q, rq);
+		}
+		rq->cmd_flags &= ~REQ_STARTED;
+		q->nr_sorted++;
+	}
+
+	return res;
+}
+
 void elv_drain_elevator(struct request_queue *q)
 {
 	static int printed;
@ -731,6 +766,11 @@ void elv_completed_request(struct request_queue *q, struct request *rq)
 {
 	struct elevator_queue *e = q->elevator;

+	if (rq->cmd_flags & REQ_URGENT) {
+		q->notified_urgent = false;
+		WARN_ON(!q->dispatched_urgent);
+		q->dispatched_urgent = false;
+	}
 	/*
 	 * request is released from the driver, io must be done
 	 */
--- a/block/row-iosched.c
+++ b/block/row-iosched.c
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@ -189,7 +189,7 @@ enum rq_flag_bits {
 				 * throttling rules. Don't do it again. */

 	/* request only flags */
-	__REQ_SORTED,		/* elevator knows about this request */
+	__REQ_SORTED = __REQ_RAHEAD,		/* elevator knows about this request */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
 	__REQ_NOMERGE,		/* don't touch this for merging */
 	__REQ_STARTED,		/* drive already may have started this one */
@ -206,6 +206,7 @@ enum rq_flag_bits {
 	__REQ_FLUSH_SEQ,	/* request for flush sequence */
 	__REQ_IO_STAT,		/* account I/O stat */
 	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
+	__REQ_URGENT,    	/* urgent request */
 	__REQ_PM,		/* runtime pm request */
 	__REQ_HASHED,		/* on IO scheduler merge hash */
 	__REQ_MQ_INFLIGHT,	/* track inflight for MQ */
@ -221,6 +222,7 @@ enum rq_flag_bits {
 #define REQ_META		(1ULL << __REQ_META)
 #define REQ_PRIO		(1ULL << __REQ_PRIO)
 #define REQ_DISCARD		(1ULL << __REQ_DISCARD)
+#define REQ_URGENT    		(1 << __REQ_URGENT)
 #define REQ_WRITE_SAME		(1ULL << __REQ_WRITE_SAME)
 #define REQ_NOIDLE		(1ULL << __REQ_NOIDLE)
 #define REQ_INTEGRITY		(1ULL << __REQ_INTEGRITY)
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@ -302,6 +302,7 @@ struct request_queue {
 	struct request_list	root_rl;

 	request_fn_proc		*request_fn;
+	request_fn_proc		*urgent_request_fn;
 	make_request_fn		*make_request_fn;
 	prep_rq_fn		*prep_rq_fn;
 	unprep_rq_fn		*unprep_rq_fn;
@ -424,6 +425,8 @@ struct request_queue {
 #endif

 	struct queue_limits	limits;
+	bool			notified_urgent;
+	bool			dispatched_urgent;

 	/*
 	 * sg stuff
@ -779,6 +782,8 @@ extern struct request *blk_make_request(struct request_queue *, struct bio *,
 					gfp_t);
 extern void blk_rq_set_block_pc(struct request *);
 extern void blk_requeue_request(struct request_queue *, struct request *);
+extern int blk_reinsert_request(struct request_queue *q, struct request *rq);
+extern bool blk_reinsert_req_sup(struct request_queue *q);
 extern void blk_add_request_payload(struct request *rq, struct page *page,
 		unsigned int len);
 extern int blk_lld_busy(struct request_queue *q);
@ -965,6 +970,7 @@ extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
 extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
 						      request_fn_proc *, spinlock_t *);
+extern void blk_urgent_request(struct request_queue *q, request_fn_proc *fn);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@ -24,6 +24,9 @@ typedef void (elevator_bio_merged_fn) (struct request_queue *,
 typedef int (elevator_dispatch_fn) (struct request_queue *, int);

 typedef void (elevator_add_req_fn) (struct request_queue *, struct request *);
+typedef int (elevator_reinsert_req_fn) (struct request_queue *,
+					struct request *);
+typedef bool (elevator_is_urgent_fn) (struct request_queue *);
 typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *);
 typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *);
 typedef int (elevator_may_queue_fn) (struct request_queue *, int);
@ -51,6 +54,9 @@ struct elevator_ops

 	elevator_dispatch_fn *elevator_dispatch_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
+	elevator_reinsert_req_fn *elevator_reinsert_req_fn;
+	elevator_is_urgent_fn *elevator_is_urgent_fn;
+
 	elevator_activate_req_fn *elevator_activate_req_fn;
 	elevator_deactivate_req_fn *elevator_deactivate_req_fn;

@ -130,6 +136,7 @@ extern void elv_merged_request(struct request_queue *, struct request *, int);
 extern void elv_bio_merged(struct request_queue *q, struct request *,
 				struct bio *);
 extern void elv_requeue_request(struct request_queue *, struct request *);
+extern int elv_reinsert_request(struct request_queue *, struct request *);
 extern struct request *elv_former_request(struct request_queue *, struct request *);
 extern struct request *elv_latter_request(struct request_queue *, struct request *);
 extern int elv_register_queue(struct request_queue *q);