android_kernel_samsung_univ.../mm
Pavel Tatashin ec8d6e953a mm: don't allow deferred pages with NEED_PER_CPU_KM
commit ab1e8d8960b68f54af42b6484b5950bd13a4054b upstream.

It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.

My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.

Below is a more detailed explanation of the problem.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill the
   first section, and lower zones.

2. During mm_init() we initialize "struct pages" for all the memory that
   is allocated, i.e reserved in memblock.

3. Using on-demand logic when pages are allocated after mm_init call
   (when memblock is finished)

4. After smp_init() when the rest free deferred pages are initialized.

The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2.  Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP

The following path exposes the problem:

  start_kernel()
   trap_init()
    setup_cpu_entry_areas()
     setup_cpu_entry_area(cpu)
      get_cpu_gdt_paddr(cpu)
       per_cpu_ptr_to_phys(addr)
        pcpu_addr_to_page(addr)
         virt_to_page(addr)
          pfn_to_page(__pa(addr) >> PAGE_SHIFT)

We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.

The problems are discussed in these threads:
  http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com

Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26 08:48:55 +02:00
..
kasan kasan: respect /proc/sys/kernel/traceoff_on_warning 2017-06-17 06:39:36 +02:00
backing-dev.c writeback: fix the wrong congested state variable definition 2018-04-08 11:51:56 +02:00
balloon_compaction.c virtio_balloon: fix race between migration and ballooning 2016-03-03 15:07:18 -08:00
bootmem.c
cleancache.c
cma_debug.c
cma.c cma: fix calculation of aligned offset 2018-01-31 12:06:09 +01:00
cma.h
compaction.c mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page 2018-01-17 09:35:26 +01:00
debug-pagealloc.c mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites" 2017-11-24 11:26:29 +01:00
debug.c
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:03:41 +01:00
fadvise.c
failslab.c
filemap.c mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read 2018-05-26 08:48:54 +02:00
frame_vector.c
frontswap.c
gup.c mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
highmem.c
huge_memory.c thp: fix MADV_DONTNEED vs. numa balancing race 2017-12-16 10:33:50 +01:00
hugetlb_cgroup.c
hugetlb.c mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() 2017-04-08 09:53:32 +02:00
hwpoison-inject.c
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2017-01-06 11:16:11 +01:00
internal.h mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
interval_tree.c
Kconfig mm: don't allow deferred pages with NEED_PER_CPU_KM 2018-05-26 08:48:55 +02:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: add scheduling point to kmemleak_scan() 2018-02-03 17:04:29 +01:00
ksm.c mm,ksm: fix endless looping in allocating memory when ksm enable 2016-10-07 15:23:40 +02:00
list_lru.c mm/list_lru.c: fix list_lru_count_node() to be race free 2017-07-21 07:44:56 +02:00
maccess.c
madvise.c mm/madvise.c: fix madvise() infinite loop under special circumstances 2017-12-05 11:22:50 +01:00
Makefile
memblock.c mm: consider memblock reservations for deferred memory initialization sizing 2017-06-14 13:16:26 +02:00
memcontrol.c hwpoison, memcg: forcibly uncharge LRU pages 2018-01-31 12:06:09 +01:00
memory_hotplug.c base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-02-09 08:02:47 +01:00
memory-failure.c hwpoison, memcg: forcibly uncharge LRU pages 2018-01-31 12:06:09 +01:00
memory.c mm: allow GFP_{FS,IO} for page_cache_read page cache allocation 2018-04-24 09:32:11 +02:00
mempolicy.c mm/mempolicy: fix use after free when calling get_mempolicy 2017-08-24 17:02:35 -07:00
mempool.c mm/mempool: avoid KASAN marking mempool poison checks as use-after-free 2017-08-12 19:29:09 -07:00
memtest.c
migrate.c Sanitize 'move_pages()' permission checks 2017-08-24 17:02:36 -07:00
mincore.c
mlock.c mlock: fix mlock count can not decrease in race condition 2017-06-07 12:06:01 +02:00
mm_init.c
mmap.c mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack 2018-01-31 12:06:09 +01:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2017-12-25 14:22:09 +01:00
mmu_notifier.c
mmzone.c
mprotect.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
mremap.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-11 09:08:50 -07:00
msync.c
nobootmem.c
nommu.c
oom_kill.c mm/oom_kill.c: avoid attempting to kill init sharing same memory 2015-12-12 10:15:34 -08:00
page_alloc.c mm, page_alloc: fix potential false positive in __zone_watermark_ok 2018-01-31 12:06:09 +01:00
page_counter.c
page_ext.c mm/page_ext.c: check if page_ext is not prepared 2017-11-24 08:32:25 +01:00
page_idle.c
page_io.c
page_isolation.c mm: fix invalid node in alloc_migrate_target() 2016-04-20 15:41:53 +09:00
page_owner.c mm: check the return value of lookup_page_ext for all call sites 2017-11-24 08:32:25 +01:00
page-writeback.c writeback: safer lock nesting 2018-04-24 09:32:12 +02:00
pagewalk.c mm/pagewalk.c: report holes in hugetlb ranges 2017-11-24 08:32:25 +01:00
percpu-km.c
percpu-vm.c
percpu.c percpu: include linux/sched.h for cond_resched() 2018-05-16 10:06:46 +02:00
pgtable-generic.c mm,thp: khugepaged: call pte flush at the time of collapse 2016-02-25 12:01:23 -08:00
process_vm_access.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
quicklist.c
readahead.c
rmap.c mm/rmap: batched invalidations should use existing api 2017-12-25 14:22:09 +01:00
shmem.c tmpfs: fix regression hang in fallocate undo 2016-07-27 09:47:40 -07:00
slab_common.c slub: do not merge cache if slub_debug contains a never-merge flag 2017-10-21 17:09:05 +02:00
slab.c mm, slab: reschedule cache_reap() on the same CPU 2018-04-24 09:32:05 +02:00
slab.h slab/slub: adjust kmem_cache_alloc_bulk API 2015-11-22 11:58:44 -08:00
slob.c slab/slub: adjust kmem_cache_alloc_bulk API 2015-11-22 11:58:44 -08:00
slub.c slub/memcg: cure the brainless abuse of sysfs attributes 2017-06-07 12:06:01 +02:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-07-05 14:37:15 +02:00
swap_state.c
swap.c
swapfile.c swapfile: fix memory corruption via malformed swapfile 2016-11-18 10:48:34 +01:00
truncate.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
userfaultfd.c
util.c proc read mm's {arg,env}_{start,end} with mmap semaphore taken. 2018-05-26 08:48:55 +02:00
vmacache.c
vmalloc.c mm: vmalloc: don't remove inexistent guard hole in remove_vm_area() 2015-11-20 16:17:32 -08:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-03-12 06:37:25 +01:00
vmscan.c vmscan: do not force-scan file lru if its absolute size is small 2018-05-26 08:48:53 +02:00
vmstat.c proc: much faster /proc/vmstat 2018-01-10 09:27:14 +01:00
workingset.c mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() 2016-10-28 03:01:34 -04:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix zs_can_compact() integer overflow 2016-05-18 17:06:44 -07:00
zswap.c zswap: don't param_set_charp while holding spinlock 2018-01-17 09:35:27 +01:00