android_kernel_samsung_univ.../mm
Mel Gorman 7a9e41c7fe mm: pin address_space before dereferencing it while isolating an LRU page
[ Upstream commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d ]

Minchan Kim asked the following question -- what locks protects
address_space destroying when race happens between inode trauncation and
__isolate_lru_page? Jan Kara clarified by describing the race as follows

CPU1                                            CPU2

truncate(inode)                                 __isolate_lru_page()
  ...
  truncate_inode_page(mapping, page);
    delete_from_page_cache(page)
      spin_lock_irqsave(&mapping->tree_lock, flags);
        __delete_from_page_cache(page, NULL)
          page_cache_tree_delete(..)
            ...                                   mapping = page_mapping(page);
            page->mapping = NULL;
            ...
      spin_unlock_irqrestore(&mapping->tree_lock, flags);
      page_cache_free_page(mapping, page)
        put_page(page)
          if (put_page_testzero(page)) -> false
- inode now has no pages and can be freed including embedded address_space

                                                  if (mapping && !mapping->a_ops->migratepage)
- we've dereferenced mapping which is potentially already free.

The race is theoretically possible but unlikely.  Before the
delete_from_page_cache, truncate_cleanup_page is called so the page is
likely to be !PageDirty or PageWriteback which gets skipped by the only
caller that checks the mappping in __isolate_lru_page.  Even if the race
occurs, a substantial amount of work has to happen during a tiny window
with no preemption but it could potentially be done using a virtual
machine to artifically slow one CPU or halt it during the critical
window.

This patch should eliminate the race with truncation by try-locking the
page before derefencing mapping and aborting if the lock was not
acquired.  There was a suggestion from Huang Ying to use RCU as a
side-effect to prevent mapping being freed.  However, I do not like the
solution as it's an unconventional means of preserving a mapping and
it's not a context where rcu_read_lock is obviously protecting rcu data.

Link: http://lkml.kernel.org/r/20180104102512.2qos3h5vqzeisrek@techsingularity.net
Fixes: c824493528 ("mm: compaction: make isolate_lru_page() filter-aware again")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:48:55 +02:00
..
kasan kasan: fix memory hotplug during boot 2018-05-30 07:48:51 +02:00
backing-dev.c writeback: fix the wrong congested state variable definition 2018-04-08 11:51:56 +02:00
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c cma: fix calculation of aligned offset 2018-01-31 12:06:09 +01:00
cma.h
compaction.c mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page 2018-01-17 09:35:26 +01:00
debug-pagealloc.c mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites" 2017-11-24 11:26:29 +01:00
debug.c
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:03:41 +01:00
fadvise.c
failslab.c
filemap.c mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read 2018-05-26 08:48:54 +02:00
frame_vector.c
frontswap.c
gup.c mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
highmem.c
huge_memory.c thp: fix MADV_DONTNEED vs. numa balancing race 2017-12-16 10:33:50 +01:00
hugetlb_cgroup.c
hugetlb.c mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() 2017-04-08 09:53:32 +02:00
hwpoison-inject.c
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2017-01-06 11:16:11 +01:00
internal.h mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
interval_tree.c
Kconfig mm: don't allow deferred pages with NEED_PER_CPU_KM 2018-05-26 08:48:55 +02:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: add scheduling point to kmemleak_scan() 2018-02-03 17:04:29 +01:00
ksm.c mm,ksm: fix endless looping in allocating memory when ksm enable 2016-10-07 15:23:40 +02:00
list_lru.c mm/list_lru.c: fix list_lru_count_node() to be race free 2017-07-21 07:44:56 +02:00
maccess.c
madvise.c mm/madvise.c: fix madvise() infinite loop under special circumstances 2017-12-05 11:22:50 +01:00
Makefile
memblock.c mm: consider memblock reservations for deferred memory initialization sizing 2017-06-14 13:16:26 +02:00
memcontrol.c hwpoison, memcg: forcibly uncharge LRU pages 2018-01-31 12:06:09 +01:00
memory_hotplug.c base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-02-09 08:02:47 +01:00
memory-failure.c hwpoison, memcg: forcibly uncharge LRU pages 2018-01-31 12:06:09 +01:00
memory.c mm: allow GFP_{FS,IO} for page_cache_read page cache allocation 2018-04-24 09:32:11 +02:00
mempolicy.c mm/mempolicy: add nodes_empty check in SYSC_migrate_pages 2018-05-30 07:48:55 +02:00
mempool.c mm/mempool: avoid KASAN marking mempool poison checks as use-after-free 2017-08-12 19:29:09 -07:00
memtest.c
migrate.c Sanitize 'move_pages()' permission checks 2017-08-24 17:02:36 -07:00
mincore.c
mlock.c mlock: fix mlock count can not decrease in race condition 2017-06-07 12:06:01 +02:00
mm_init.c
mmap.c mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack 2018-01-31 12:06:09 +01:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2017-12-25 14:22:09 +01:00
mmu_notifier.c
mmzone.c
mprotect.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
mremap.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
msync.c
nobootmem.c
nommu.c
oom_kill.c
page_alloc.c mm, page_alloc: fix potential false positive in __zone_watermark_ok 2018-01-31 12:06:09 +01:00
page_counter.c
page_ext.c mm/page_ext.c: check if page_ext is not prepared 2017-11-24 08:32:25 +01:00
page_idle.c
page_io.c
page_isolation.c
page_owner.c mm: check the return value of lookup_page_ext for all call sites 2017-11-24 08:32:25 +01:00
page-writeback.c writeback: safer lock nesting 2018-04-24 09:32:12 +02:00
pagewalk.c mm/pagewalk.c: report holes in hugetlb ranges 2017-11-24 08:32:25 +01:00
percpu-km.c
percpu-vm.c
percpu.c percpu: include linux/sched.h for cond_resched() 2018-05-16 10:06:46 +02:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm/rmap: batched invalidations should use existing api 2017-12-25 14:22:09 +01:00
shmem.c
slab_common.c slub: do not merge cache if slub_debug contains a never-merge flag 2017-10-21 17:09:05 +02:00
slab.c mm, slab: reschedule cache_reap() on the same CPU 2018-04-24 09:32:05 +02:00
slab.h
slob.c
slub.c slub/memcg: cure the brainless abuse of sysfs attributes 2017-06-07 12:06:01 +02:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-07-05 14:37:15 +02:00
swap_state.c
swap.c
swapfile.c swapfile: fix memory corruption via malformed swapfile 2016-11-18 10:48:34 +01:00
truncate.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
userfaultfd.c
util.c proc read mm's {arg,env}_{start,end} with mmap semaphore taken. 2018-05-26 08:48:55 +02:00
vmacache.c
vmalloc.c
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-03-12 06:37:25 +01:00
vmscan.c mm: pin address_space before dereferencing it while isolating an LRU page 2018-05-30 07:48:55 +02:00
vmstat.c proc: much faster /proc/vmstat 2018-01-10 09:27:14 +01:00
workingset.c mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() 2016-10-28 03:01:34 -04:00
zbud.c
zpool.c
zsmalloc.c
zswap.c zswap: don't param_set_charp while holding spinlock 2018-01-17 09:35:27 +01:00