android_kernel_samsung_univ.../net/ipv4
Lance Richardson 0bb225a04d vti: flush x-netns xfrm cache when vti interface is removed
[ Upstream commit a5d0dc810abf3d6b241777467ee1d6efb02575fc ]

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

  kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.

A more detailed description of the problem scenario (referencing commands
in the script below):

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

  The vti_test interface is created in the init namespace. vti_tunnel_init()
  attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
  setting the tunnel net to &init_net.

(2) ip link set vti_test netns secure

  The vti_test interface is moved to the "secure" netns. Note that
  the associated struct ip_tunnel still has tunnel->net set to &init_net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

  The first packet sent using the vti device causes xfrm_lookup() to be
  called as follows:

      dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

  Note that tunnel->net is the init namespace, while skb_dst(skb) references
  the vti_test interface in the "secure" namespace. The returned dst
  references an interface in the init namespace.

  Also note that the first parameter to xfrm_lookup() determines which flow
  cache is used to store the computed xfrm bundle, so after xfrm_lookup()
  returns there will be a cached bundle in the init namespace flow cache
  with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

  Kernel begins to delete the "secure" namespace.  At some point the
  vti_test interface is deleted, at which point dst_ifdown() changes
  the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
  in the "secure" namespace however).
  Since nothing has happened to cause the init namespace's flow cache
  to be garbage collected, this dst remains attached to the flow cache,
  so the kernel loops waiting for the last reference to lo to go away.

<Begin script>
ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
   proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
   proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure
<End script>

Reported-by: Hangbin Liu <haliu@redhat.com>
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
..
netfilter netfilter: x_tables: introduce and use xt_copy_counters_from_user 2016-06-24 10:18:24 -07:00
af_inet.c net: add validation for the socket syscall protocol argument 2015-12-14 16:09:30 -05:00
ah4.c
arp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-20 06:08:27 -07:00
cipso_ipv4.c
datagram.c
devinet.c ipv4: Don't do expensive useless work during inetdev destroy. 2016-04-20 15:42:03 +09:00
esp4.c esp: Fix ESN generation under UDP encapsulation 2016-07-11 09:31:11 -07:00
fib_frontend.c ipv4/fib: don't warn when primary address is missing if in_dev is dead 2016-05-18 17:06:36 -07:00
fib_lookup.h
fib_rules.c
fib_semantics.c ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space 2016-08-16 09:30:47 +02:00
fib_trie.c ipv4: panic in leaf_walk_rcu due to stale node pointer 2016-09-30 10:18:34 +02:00
fou.c fou: clean up socket with kfree_rcu 2015-12-16 19:03:02 -05:00
gre_demux.c
gre_offload.c ipv6: gre: support SIT encapsulation 2015-10-26 22:01:18 -07:00
icmp.c
igmp.c mld, igmp: Fix reserved tailroom calculation 2016-04-20 15:41:58 +09:00
inet_connection_sock.c tcp/dccp: fix another race at listener dismantle 2016-03-03 15:07:07 -08:00
inet_diag.c
inet_fragment.c net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
inet_hashtables.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
inet_lro.c
inet_timewait_sock.c
inetpeer.c
ip_forward.c
ip_fragment.c inet: frag: Always orphan skbs inside ip_defrag() 2016-03-03 15:07:04 -08:00
ip_gre.c vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-06-24 10:18:18 -07:00
ip_input.c
ip_options.c
ip_output.c ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL 2016-04-20 15:41:57 +09:00
ip_sockglue.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
ip_tunnel_core.c
ip_tunnel.c vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-06-24 10:18:18 -07:00
ip_vti.c vti: flush x-netns xfrm cache when vti interface is removed 2016-09-30 10:18:36 +02:00
ipcomp.c
ipconfig.c ipconfig: send Client-identifier in DHCP requests 2015-10-18 19:23:52 -07:00
ipip.c ipip: ioctl: Remove superfluous IP-TTL handling. 2015-12-18 16:07:59 -05:00
ipmr.c ipmr/ip6mr: Initialize the last assert time of mfc entries. 2016-07-11 09:31:11 -07:00
Kconfig
Makefile tcp: track the packet timings in RACK 2015-10-21 07:00:48 -07:00
netfilter.c
ping.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
proc.c
protocol.c
raw.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
route.c route: do not cache fib route info on local routes with oif 2016-05-18 17:06:35 -07:00
syncookies.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
sysctl_net_ipv4.c ipv4: disable BH when changing ip local port range 2015-11-04 21:29:06 -05:00
tcp_bic.c
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c tcp: allow dctcp alpha to drop to zero 2015-10-23 02:46:52 -07:00
tcp_diag.c tcp: ensure proper barriers in lockless contexts 2015-11-15 18:36:38 -05:00
tcp_fastopen.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: enable per-socket rate limiting of all 'challenge acks' 2016-08-16 09:30:47 +02:00
tcp_ipv4.c tcp: properly scale window in tcp_v[46]_reqsk_send_ack() 2016-09-30 10:18:34 +02:00
tcp_lp.c
tcp_memcontrol.c
tcp_metrics.c tcp: convert cached rtt from usec to jiffies when feeding initial rto 2016-04-20 15:41:56 +09:00
tcp_minisocks.c tcp: fix tcpi_segs_in after connection establishment 2016-04-20 15:42:00 +09:00
tcp_offload.c
tcp_output.c tcp: consider recv buf for the initial window scale 2016-08-16 09:30:48 +02:00
tcp_probe.c
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c
tcp_timer.c tcp: fix Fast Open snmp over-counting bug 2015-11-20 10:51:12 -05:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c tcp: cwnd does not increase in TCP YeAH 2016-09-30 10:18:34 +02:00
tcp.c net:Add sysctl_max_skb_frags 2016-03-03 15:07:05 -08:00
tunnel4.c
udp_diag.c
udp_impl.h
udp_offload.c
udp_tunnel.c tunnel: Clear IPCB(skb)->opt before dst_link_failure called 2016-04-20 15:41:56 +09:00
udp.c udp: properly support MSG_PEEK with truncated buffers 2016-09-15 08:27:49 +02:00
udplite.c
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c