Matthias Kaehlcke
adb2e9f17e
thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
...
[ Upstream commit bf45ac18b78038e43af3c1a273cae4ab5704d2ce ]
The CPU load values passed to the thermal_power_cpu_get_power
tracepoint are zero for all CPUs, unless, unless the
thermal_power_cpu_limit tracepoint is enabled too:
irq/41-rockchip-98 [000] .... 290.972410: thermal_power_cpu_get_power:
cpus=0000000f freq=1800000 load={{0x0,0x0,0x0,0x0}} dynamic_power=4815
vs
irq/41-rockchip-96 [000] .... 95.773585: thermal_power_cpu_get_power:
cpus=0000000f freq=1800000 load={{0x56,0x64,0x64,0x5e}} dynamic_power=4959
irq/41-rockchip-96 [000] .... 95.773596: thermal_power_cpu_limit:
cpus=0000000f freq=408000 cdev_state=10 power=416
There seems to be no good reason for omitting the CPU load information
depending on another tracepoint. My guess is that the intention was to
check whether thermal_power_cpu_get_power is (still) enabled, however
'load_cpu != NULL' already indicates that it was at least enabled when
cpufreq_get_requested_power() was entered, there seems little gain
from omitting the assignment if the tracepoint was just disabled, so
just remove the check.
Fixes: 6828a4711f99 ("thermal: add trace events to the power allocator governor")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Javi Merino <javi.merino@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-07 13:41:59 +02:00
Wei Wang
9336c76f28
thermal: Fix deadlock in thermal thermal_zone_device_check
...
commit 163b00cde7cf2206e248789d2780121ad5e6a70b upstream.
1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone
device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid
a use-after-free issue. However, cancel_delayed_work_sync could be called
insides the WQ causing deadlock.
[54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000
[54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.642447] c0 1162 Call trace:
[54109.642456] c0 1162 __switch_to+0x138/0x158
[54109.642467] c0 1162 __schedule+0xba4/0x1434
[54109.642480] c0 1162 schedule_timeout+0xa0/0xb28
[54109.642492] c0 1162 wait_for_common+0x138/0x2e8
[54109.642511] c0 1162 flush_work+0x348/0x40c
[54109.642522] c0 1162 __cancel_work_timer+0x180/0x218
[54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4
[54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c
[54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24
[54109.642574] c0 1162 process_one_work+0x3cc/0x69c
[54109.642583] c0 1162 worker_thread+0x49c/0x7c0
[54109.642593] c0 1162 kthread+0x17c/0x1b0
[54109.642602] c0 1162 ret_from_fork+0x10/0x18
[54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000
[54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.643077] c0 1162 Call trace:
[54109.643085] c0 1162 __switch_to+0x138/0x158
[54109.643095] c0 1162 __schedule+0xba4/0x1434
[54109.643104] c0 1162 schedule_timeout+0xa0/0xb28
[54109.643114] c0 1162 wait_for_common+0x138/0x2e8
[54109.643122] c0 1162 flush_work+0x348/0x40c
[54109.643131] c0 1162 __cancel_work_timer+0x180/0x218
[54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4
[54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c
[54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24
[54109.643167] c0 1162 process_one_work+0x3cc/0x69c
[54109.643177] c0 1162 worker_thread+0x49c/0x7c0
[54109.643186] c0 1162 kthread+0x17c/0x1b0
[54109.643195] c0 1162 ret_from_fork+0x10/0x18
[54109.644500] c0 1162 cat D 0 7766 1 0x00000001
[54109.644515] c0 1162 Call trace:
[54109.644524] c0 1162 __switch_to+0x138/0x158
[54109.644536] c0 1162 __schedule+0xba4/0x1434
[54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0
[54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0
[54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20
[54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360
[54109.644586] c0 1162 temp_show+0x30/0x78
[54109.644609] c0 1162 dev_attr_show+0x5c/0xf0
[54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4
[54109.644636] c0 1162 kernfs_seq_show+0x48/0x88
[54109.644656] c0 1162 seq_read+0x1f4/0x73c
[54109.644664] c0 1162 kernfs_fop_read+0x84/0x318
[54109.644683] c0 1162 __vfs_read+0x50/0x1bc
[54109.644692] c0 1162 vfs_read+0xa4/0x140
[54109.644701] c0 1162 SyS_read+0xbc/0x144
[54109.644708] c0 1162 el0_svc_naked+0x34/0x38
[54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic]
Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-07 12:46:09 +02:00
Geert Uytterhoeven
6fc97bae42
thermal: rcar_thermal: Prevent hardware access during system suspend
...
[ Upstream commit 3a31386217628ffe2491695be2db933c25dde785 ]
On r8a7791/koelsch, sometimes the following message is printed during
system suspend:
rcar_thermal e61f0000.thermal: thermal sensor was broken
This happens if the workqueue runs while the device is already
suspended. Fix this by using the freezable system workqueue instead,
cfr. commit 51e20d0e3a60cf46 ("thermal: Prevent polling from happening
during system suspend").
Fixes: e0a5172e9eec7f0d ("thermal: rcar: add interrupt support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-07 12:35:31 +02:00
Ido Schimmel
280c621ffb
thermal: Fix use-after-free when unregistering thermal zone device
...
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
thermal_zone_device_unregister() cancels the delayed work that polls the
thermal zone, but it does not wait for it to finish. This is racy with
respect to the freeing of the thermal zone device, which can result in a
use-after-free [1].
Fix this by waiting for the delayed work to finish before freeing the
thermal zone device. Note that thermal_zone_device_set_polling() is
never invoked from an atomic context, so it is safe to call
cancel_delayed_work_sync() that can block.
[1]
[ +0.002221] ==================================================================
[ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
[ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
[ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
[ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
[ +0.000012] Call Trace:
[ +0.000021] dump_stack+0xa9/0x10e
[ +0.000020] print_address_description.cold.2+0x9/0x25e
[ +0.000018] __kasan_report.cold.3+0x78/0x9d
[ +0.000016] kasan_report+0xe/0x20
[ +0.000016] __mutex_lock+0x1076/0x11c0
[ +0.000014] step_wise_throttle+0x72/0x150
[ +0.000018] handle_thermal_trip+0x167/0x760
[ +0.000019] thermal_zone_device_update+0x19e/0x5f0
[ +0.000019] process_one_work+0x969/0x16f0
[ +0.000017] worker_thread+0x91/0xc40
[ +0.000014] kthread+0x33d/0x400
[ +0.000015] ret_from_fork+0x3a/0x50
[ +0.000020] Allocated by task 1:
[ +0.000015] save_stack+0x19/0x80
[ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0
[ +0.000014] kmem_cache_alloc_trace+0x152/0x320
[ +0.000015] thermal_zone_device_register+0x1b4/0x13a0
[ +0.000015] mlxsw_thermal_init+0xc92/0x23d0
[ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0
[ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90
[ +0.000013] mlxsw_pci_probe+0x355/0x4b0
[ +0.000014] local_pci_probe+0xc3/0x150
[ +0.000013] pci_device_probe+0x280/0x410
[ +0.000013] really_probe+0x26a/0xbb0
[ +0.000013] driver_probe_device+0x208/0x2e0
[ +0.000013] device_driver_attach+0xfe/0x140
[ +0.000013] __driver_attach+0x110/0x310
[ +0.000013] bus_for_each_dev+0x14b/0x1d0
[ +0.000013] driver_register+0x1c0/0x400
[ +0.000015] mlxsw_sp_module_init+0x5d/0xd3
[ +0.000014] do_one_initcall+0x239/0x4dd
[ +0.000013] kernel_init_freeable+0x42b/0x4e8
[ +0.000012] kernel_init+0x11/0x18b
[ +0.000013] ret_from_fork+0x3a/0x50
[ +0.000015] Freed by task 581:
[ +0.000013] save_stack+0x19/0x80
[ +0.000014] __kasan_slab_free+0x125/0x170
[ +0.000013] kfree+0xf3/0x310
[ +0.000013] thermal_release+0xc7/0xf0
[ +0.000014] device_release+0x77/0x200
[ +0.000014] kobject_put+0x1a8/0x4c0
[ +0.000014] device_unregister+0x38/0xc0
[ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0
[ +0.000014] mlxsw_thermal_fini+0x184/0x35a
[ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640
[ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210
[ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0
[ +0.000014] genl_family_rcv_msg+0x700/0xee0
[ +0.000013] genl_rcv_msg+0xca/0x170
[ +0.000013] netlink_rcv_skb+0x137/0x3a0
[ +0.000012] genl_rcv+0x29/0x40
[ +0.000013] netlink_unicast+0x49b/0x660
[ +0.000013] netlink_sendmsg+0x755/0xc90
[ +0.000013] __sys_sendto+0x3de/0x430
[ +0.000013] __x64_sys_sendto+0xe2/0x1b0
[ +0.000013] do_syscall_64+0xa4/0x4d0
[ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000017] The buggy address belongs to the object at ffff8881e48e0008
which belongs to the cache kmalloc-2k of size 2048
[ +0.000012] The buggy address is located 1096 bytes inside of
2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
[ +0.000007] The buggy address belongs to the page:
[ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
[ +0.000020] flags: 0x200000000010200(slab|head)
[ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
[ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[ +0.000007] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address:
[ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000008] ^
[ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000007] ==================================================================
Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-07 08:08:06 +02:00
Matthew Garrett
8e5f529e18
thermal/int340x_thermal: fix mode setting
...
[ Upstream commit 396ee4d0cd52c13b3f6421b8d324d65da5e7e409 ]
int3400 only pushes the UUID into the firmware when the mode is flipped
to "enable". The current code only exposes the mode flag if the firmware
supports the PASSIVE_1 UUID, which not all machines do. Remove the
restriction.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-06 15:17:09 +02:00
Matthew Garrett
53b0e9fffb
thermal/int340x_thermal: Add additional UUIDs
...
[ Upstream commit 16fc8eca1975358111dbd7ce65e4ce42d1a848fb ]
Add more supported DPTF policies than the driver currently exposes.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Cc: Nisha Aram <nisha.aram@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-06 15:17:07 +02:00
prashantpaddune
3bca37f224
A750FXXU4CTBC
2020-03-27 21:51:54 +05:30