遇到一个问题,libvirt里设置outbound(对于虚机来说就是入口流量)后却看不到相应的tc ingress rules。

Linux TC

Linux TC用来控制QoS,TC包括三个部分: 队列规定qdisc(queueing discipline )、类(class)和过滤器(filter)。filter -> class -> queue
队列(queueing discipline)。qdisc的次号码永远是0,如1:或10:, class的次号码不能是0,如1:1。例如:

tc qdisc del dev eth0 root
# 两种qdisc,CBQ与HTB。CBQ比较复杂,HTB是改良版,一般优先使用更好的HTB。其中'default 2'表示不满足任何已设定的filter的流量默认归入class 1:2
# htb可以保证每个类型的带宽,但是也允许特定的类可以突破带宽上限,占用别的类的带宽。
tc qdisc add dev eth0 root handle 1: htb default 2
# 设置总上传宽带,此处还可以为各种流量设置更多的分类。
# 另外,大队列有助于改善丢包提升速度,所以ISP一般采用大队列,但大队列会破坏交互性流量。光猫处的队列无法更改,将其挪到此Linux路由器中。
# rate参数表示一个class保证得到的带宽值,prio参数表示借用带宽时的优先级,ceil参数表示一个class能得到的最大带宽值
tc class add dev eth0 parent 1: classid 1:1 htb rate 220kbit burst 6k
# htb队列规定下一般再挂sfq随机公平队列,以确保在该队列中的大流量不会发生饥饿。
tc qdisc add dev eth0 parent 1:1 handle 10: sfq perturb 10
# tc过滤器有两种,u32与fw,fw是靠iptables给封包贴标签,避免了u32去理解复杂的封包结构。
tc filter add dev eth0 parent 1: protocol ip prio handle 1 fw classid 1:1
tc -s class show dev eth0
# iptables给封包贴标签
iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN


tc qdisc del dev $DEV ingress
tc qdisc add dev $DEV handle ffff: ingress
# fileter everything to it, drop everthing that's coming in too fast
tc filter add dev vnet1 parent ffff: protocol all u32 match u32 0 0 police rate 122kbit burst 10k mtu 64kb drop flowid :1

OpenStack QoS based on InstanceResourceQuota

可在openstack中直接运行命令“nova flavor-key m1.small set quota:vif_outbound_average=20”(出libvirt的outbound对于VM来说是inbound入口下载流量), 因为openstack生成了linux port,所以libvirt将产生如下配置:

 <interface type='bridge'>
      <mac address='fa:16:3e:db:83:fe'/>
      <source bridge='qbr3869c552-8b'/>
        <outbound average='20'/>
      <target dev='tap3869c552-8b'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>

产生了如下tc rules:

qdisc pfifo_fast 0: dev tap3869c552-8b root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc ingress ffff: dev tap3869c552-8b parent ffff:fff1 ---------------- 

对于虚机的上传流量(出虚机的outbound流量),需设置libvirt的inbound(nova flavor-key m1.small set quota:vif_inbound_average=20),它将产生下列的tc rules:

qdisc htb 1: dev tapb1afdddf-ba root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 500
qdisc sfq 2: dev tapb1afdddf-ba parent 1:1 limit 127p quantum 1514b depth 127 divisor 1024 perturb 10sec

Libvirt’s OVS QoS

1, 设置让qemu支持ovs port

sudo apt-get install qemu-system qemu-kvm virtinst libvirt-bin openvswitch-datapath-source openvswitch-controller openvswitch-switch virt-top virt-manager python-libvirt
sudo ovs-vsctl add-br br-phy
sudo virsh net-destroy default
sudo virsh net-edit default
  <forward mode='bridge'/>
  <bridge name='br-phy'/>
  <virtualport type='openvswitch'/>
sudo virsh net-undefine default
sudo virsh net-autostart br-phy

2, 使用”virsh edit ”在

<source network='br-phy'/>
<virtualport type='openvswitch'>


    <interface type='network'>
      <mac address='52:54:00:ae:17:05'/>
      <source network='br-phy'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='86814ca6-615b-41cd-8d85-4873638d1b66'/>
        <outbound average='2048'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>

3, 使用”tc qdisc show”命令查看如下。

qdisc pfifo_fast 0: dev vnet1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

为什么找不着’qdisc ingress ffff:’相关的配置呢? 下面的内容将解决这个问题。



root@node1:~# tc qdisc list |grep vnet1
root@node1:~# ovs-vsctl set interface vnet1 ingress_policing_rate=8 ingress_policing_burst=2
root@node1:~# tc qdisc show |grep vnet1
qdisc pfifo_fast 0: dev vnet1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc ingress ffff: dev vnet1 parent ffff:fff1 ---------------- 
root@node1:~# sudo ovs-vsctl set interface vnet1 ingress_policing_rate=0 ingress_policing_burst=0
root@node1:~# sudo ovs-vsctl list interface vnet1




tc filter add dev <devname> parent ffff: protocol all prio 49 basic police rate <kbits_rate>kbit burst <kbits_burst>k mtu 65535 drop

Libvirt QoS


  • 在libvirt中,virNetDevBandwidthSet()支持直接使用tc设置设置ingress rules (https://github.com/libvirt/libvirt/blob/v1.2.2-maint/src/util/virnetdevbandwidth.c#L224)
  • 在ovs中,netdev_set_policing()支持使用ovs自己的方式(ovs-vsctl set interface vnet1 ingress_policing_rate=8 ingress_policing_burst=2)设置ingress rules(https://github.com/openvswitch/ovs/blob/v2.6.1/vswitchd/bridge.c#L4512),但是它会先删除libvirt中使用tc直接创建的qos设置(这个链接里的tc_add_del_ingress_qdisc方法 - https://github.com/openvswitch/ovs/blob/master/lib/netdev-linux.c#L2132),然后基于ovs的方式再创建QoS。但libvirt中的virnetdevopenvswitch.c也没有调用ovs qos相关的命令来创建qos,所以这个问题这产生了。
  • Neutron QoSaaS支持调用ovs qos命令去给ovs port设置ingress流量($neutron/agent/common/ovs_lib.py#_set_egress_bw_limit_for_port()), 这个可以作为libvirt不支持ovs port qos特性的替代。


注:gdb debug时最好不要源码编译包,最好使用dbg包。一定要源码编译也最好不要使用诸如’–prefix=/usr’之类的变更默认的安装路径。
sudo rm -rf /usr/local/lib/libvirt* & sudo rm -rf /usr/local/lib/systemd/system/libvirt*

1, The first step, use 'sudo virsh start xenial' to trigger debugging process.

hua@node1:~$ sudo virsh start xenial

2, Then ovs-vswitchd will create ingress tc rules in #L375 and #L385 of libvirtd's virnetdevbandwidth.c (This is code: https://github.com/libvirt/libvirt/blob/v1.2.2-maint/src/util/virnetdevbandwidth.c#L224)

hua@node1:~$ sudo gdb -p `pidof libvirtd`
(gdb) c
[Switching to Thread 0x7f96d2c9d700 (LWP 7196)]

Thread 4 "libvirtd" hit Breakpoint 1, virNetDevBandwidthSet (ifname=0x7f96b8006b30 "vnet0", bandwidth=bandwidth@entry=0x7f96b8003f20, 
    hierarchical_class=hierarchical_class@entry=false, swapped=true) at util/virnetdevbandwidth.c:200
200 {
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007f96d936ac60 in virNetDevBandwidthSet at util/virnetdevbandwidth.c:200
    breakpoint already hit 4 times
(gdb) c

After running #L375 and #L385, we can see ingress tc rules have been created by using the command 'tc qdisc show |grep vnet0'.

hua@node1:~$ tc qdisc show |grep vnet0
qdisc pfifo_fast 0: dev vnet0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc ingress ffff: dev vnet0 parent ffff:fff1 ---------------- 

3, Then ovs-vswitchd will stop at the break lib/netdev-linux.c:2132 we set.

hua@node1:~$ sudo gdb -p `pidof ovs-vswitchd`
(gdb) c
[Thread 0x7fbf6ca34940 (LWP 27610) exited]

Thread 1 "ovs-vswitchd" hit Breakpoint 4, netdev_linux_set_policing (netdev_=0x1517930, kbits_rate=0, kbits_burst=0)
    at lib/netdev-linux.c:2132
2132        error = tc_add_del_ingress_qdisc(ifindex, false);
(gdb) info b
Num     Type           Disp Enb Address            What
4       breakpoint     keep y   0x00000000005aced2 in netdev_linux_set_policing at lib/netdev-linux.c:2132
    breakpoint already hit 5 times

After running lib/netdev-linux.c:2132 (https://github.com/openvswitch/ovs/blob/master/lib/netdev-linux.c#L2132), we can see ingress tc rules are been deleted. 

(gdb) p kbits_rate
$5 = 0
(gdb) n
2133        if (error) {
(gdb) n
[New Thread 0x7fbf6ca34940 (LWP 27739)]
2139        if (kbits_rate) {
(gdb) n
[Thread 0x7fbf6ca34940 (LWP 27739) exited]
2155        netdev->kbits_rate = kbits_rate;

hua@node1:~$ tc qdisc show |grep vnet0
qdisc pfifo_fast 0: dev vnet0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

4, Because we don't use ovs's way to configure ingress setting, tc_add_policer(netdev_, kbits_rate, kbits_burst) in #L2147 will not be run (https://github.com/openvswitch/ovs/blob/master/lib/netdev-linux.c#L2147) so that the problem occurs. This is a ovs's limitation.

附录 - OVS QoS相关命令演示

root@node1:~# ovs-appctl -t ovs-vswitchd qos/show-types vnet0
QoS type: linux-fq_codel
QoS type: linux-codel
QoS type: linux-hfsc
QoS type: linux-noop
QoS type: linux-sfq
QoS type: linux-htb

root@node1:~# ovs-vsctl set interface vnet0 ingress_policing_rate=8 ingress_policing_burst=2

root@node1:~# ovs-vsctl list interface vnet0 |grep ingress
ingress_policing_burst: 2
ingress_policing_rate: 8

root@node1:~# ovs-vsctl set port vnet0 qos=@newqos -- --id=@newqos create qos type=linux-noop
root@node1:~# ovs-vsctl list qos
_uuid               : 2a2ed08d-7b3e-4b03-bb7e-a297b5bdf2a7
external_ids        : {}
other_config        : {}
queues              : {}
type                : linux-noop
root@node1:~# ovs-vsctl list port vnet0 |grep qos
qos                 : 2a2ed08d-7b3e-4b03-bb7e-a297b5bdf2a7
root@node1:~# ovs-vsctl --all destroy qos
root@node1:~# ovs-vsctl destroy qos 5adaccad-fbe8-49e3-908a-742cab85ce95
root@node1:~# ovs-vsctl list queue

测试Neutron QoSaaS

我们知道了,Neutron QoSaaS为OVS实现了ingress驱动,让我们测试它:
1, 安装QoSaaS,运行命令“juju config neutron-api enable-qos=True”将自动产生下列配置:

ubuntu@zhhuabj-bastion:~$ juju ssh neutron-api/0 -- sudo grep -r 'qos' /etc/neutron/neutron.conf
service_plugins = router,firewall,vpnaas,metering,neutron_lbaas.services.loadbalancer.plugin.LoadBalancerPluginv2,qos
ubuntu@zhhuabj-bastion:~$ juju ssh neutron-api/0 -- sudo grep -r 'qos' /etc/neutron/plugins/ml2/ml2_conf.ini -B 1

ubuntu@zhhuabj-bastion:~$ juju ssh neutron-gateway/0 -- sudo grep -r 'qos' /etc/neutron/plugins/ml2/openvswitch_agent.ini -B 1
extensions = qos
ubuntu@zhhuabj-bastion:~$ juju ssh nova-compute/0 -- sudo grep -r 'qos' /etc/neutron/plugins/ml2/openvswitch_agent.ini -B 1
extensions = qos

2, Neutron自Liberty提供了这种驱动,如下的输出知道它支持两种,一种是通过bandwidth_limit为TCP流量限流,一种是通过dscp为IP流量限流.

ubuntu@zhhuabj-bastion:~$ neutron qos-available-rule-types
| type            |
| dscp_marking    |
| bandwidth_limit |


neutron qos-policy-create dscp-marking
neutron qos-dscp-marking-rule-create dscp-marking --dscp-mark 26
neutron port-update 750901a3-70b3-4907-a52a-0025fac9d6c1 --qos-policy dscp-marking

3, 本文主要是测试ovs ingress qos, 所以做如下配置:

neutron qos-policy-create egress-qos-policy
neutron qos-bandwidth-limit-rule-create egress-qos-policy --max-kbps 300 --max-burst-kbps 30 --egress
neutron qos-policy-list
neutron port-update --qos-policy egress-qos-policy 8c8b9944-c9e1-4343-89e5-03f77c2e058d
#neutron port-update --no-qos-policy 8c8b9944-c9e1-4343-89e5-03f77c2e058d
#neutron port-create <port-name> --qos-policy-id egress-qos-policy

注意, 也可以针对net级别, 但tc rules仍然是设置在qvo接口上.

neutron net-update <net-id> --qos-policy egress-qos-policy

4, 验证,neutron qosaas服务将根据上述设置生成下列qos设置:

root@juju-7e3a3f-xenial-mitaka-qos-8:~# sudo ovs-vsctl list interface qvo239cf73e-7e |grep ingress
ingress_policing_burst: 30
ingress_policing_rate: 300

root@juju-7e3a3f-xenial-mitaka-qos-8:~# tc qdisc show |grep qvo239cf73e-7e
qdisc noqueue 0: dev qvo239cf73e-7e root refcnt 2 
qdisc ingress ffff: dev qvo239cf73e-7e parent ffff:fff1 ----------------

5, 其他, ingress qos (针对虚机是ingress即下载, 对交换机而言是egress)

openstack network qos policy create ingress-qos-policy
openstack network qos rule create --type bandwidth-limit --max-kbps 300 --max-burst-kbits 30 --ingress ingress-qos-policy
neutron port-update --qos-policy ingress-qos-policy 19d440ef-5f27-4c26-8dc6-c994d9394ea8

root@juju-23f84c-queens-dvr-8:~# tc qdisc show |grep qvo
qdisc htb 1: dev qvo19d440ef-5f root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000

6, 其他, minimum-bandwidth

# minimum-bandwidth is for egress qos, but it's just supported by sriovnicswitch driver, NOT supported by openvswitch driver
openstack network qos policy create bandwidth-control
openstack network qos rule create --type minimum-bandwidth --min-kbps 512 --egress bandwidth-control
#openstack port set --qos-policy bandwidth-control 19d440ef-5f27-4c26-8dc6-c994d9394ea8
neutron port-update --qos-policy bandwidth-control 19d440ef-5f27-4c26-8dc6-c994d9394ea8

7, 其他, FIP QoS
qos从queens版本也支持对FIP设置egress与ingress的QoS, tc rules能设置在:

  • qg device in qr ns for legacy and HA routers
  • rfp device in qr ns for DVR local routers
  • qg device in snat ns for DVR edge routers
openstack floating ip create --qos-policy ingress-qos-policy ext_net
#neutron port-update --qos-policy egress-qos-policy <FIP-port-id>
neutron floatingip-associate $(neutron floatingip-list |grep |awk '{print $2}') $(neutron port-list |grep '' |awk '{print $2}')

# 实际下面未看到输出, 有时间待查
sudo ip netns exec qrouter-xxx tc qdisc show dev qg-xxx
sudo ip netns exec qrouter-xxx tc -p -s -d filter show dev qg-xxx
ip netns exec qrouter-909c6b55-9bc6-476f-9d28-c32d031c41d7 tc qdisc show dev rfp-909c6b55-9
ip netns exec snat-909c6b55-9bc6-476f-9d28-c32d031c41d7 tc qdisc show dev qg-d5ed764e-a6


