android_kernel_samsung_univ.../arch/tile
Chris Metcalf 4df31626fc tile: avoid using clocksource_cyc2ns with absolute cycle count
commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream.

For large values of "mult" and long uptimes, the intermediate
result of "cycles * mult" can overflow 64 bits.  For example,
the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
we have mult = 853, and after 208.5 days, we overflow 64 bits.

Since clocksource_cyc2ns() is intended to be used for relative
cycle counts, not absolute cycle counts, performance is more
importance than accepting a wider range of cycle values.  So,
just use mult_frac() directly in tile's sched_clock().

Commit 4cecf6d401 ("sched, x86: Avoid unnecessary overflow
in sched_clock") by Salman Qazi results in essentially the same
generated code for x86 as this change does for tile.  In fact,
a follow-on change by Salman introduced mult_frac() and switched
to using it, so the C code was largely identical at that point too.

Peter Zijlstra then added mul_u64_u32_shr() and switched x86
to use it.  This is, in principle, better; by optimizing the
64x64->64 multiplies to be 32x32->64 multiplies we can potentially
save some time.  However, the compiler piplines the 64x64->64
multiplies pretty well, and the conditional branch in the generic
mul_u64_u32_shr() causes some bubbles in execution, with the
result that it's pretty much a wash.  If tilegx provided its own
implementation of mul_u64_u32_shr() without the conditional branch,
we could potentially save 3 cycles, but that seems like small gain
for a fair amount of additional build scaffolding; no other platform
currently provides a mul_u64_u32_shr() override, and tile doesn't
currently have an <asm/div64.h> header to put the override in.

Additionally, gcc currently has an optimization bug that prevents
it from recognizing the opportunity to use a 32x32->64 multiply,
and so the result would be no better than the existing mult_frac()
until such time as the compiler is fixed.

For now, just using mult_frac() seems like the right answer.

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-02 09:09:01 +01:00
..
configs netpoll: delete defconfig references to obsolete NETPOLL_TRAP 2014-11-29 21:13:48 -08:00
gxio tile: use global strscpy() rather than private copy 2015-09-10 15:37:02 -04:00
include tile: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO 2016-10-07 15:23:44 +02:00
kernel tile: avoid using clocksource_cyc2ns with absolute cycle count 2016-12-02 09:09:01 +01:00
kvm rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
lib tile: Provide atomic_{or,xor,and} 2015-07-27 14:06:24 +02:00
mm kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
Kbuild
Kconfig tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro 2016-01-05 08:16:09 -05:00
Kconfig.debug tile: remove DEBUG_EXTRA_FLAGS kernel config option 2013-09-03 14:52:17 -04:00
Makefile tile: remove DEBUG_EXTRA_FLAGS kernel config option 2013-09-03 14:52:17 -04:00