If the qdisc could be bypass, such as fifo qdisc, and it is a empty qdisc, and the qdisc is not running,
set the qdisc as running, then send the packet directly by sch_direct_xmit. If send success, clear the running flag by qdisc_run_end, or(send failed), put the skb to qdisc queue by dev_requeue_skb.
case 2: enqueue and then send
In this case, skb must firstly enqueue. Check and confirm qdisc is running, if it is not running before check, call __qdisc_run.
94/* qdisc ->enqueue() return codes. */ 95#define NET_XMIT_SUCCESS 0x00 96#define NET_XMIT_DROP 0x01 /* skb dropped */ 97#define NET_XMIT_CN 0x02 /* congestion notification */ 98#define NET_XMIT_POLICED 0x03 /* skb is shot by police */ 99#define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ 100 101/* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It 102 * indicates that the device will soon be dropping packets, or already drops 103 * some packets of the same priority; prompting us to send less aggressively. */ 104#define net_xmit_eval(e) ((e) == NET_XMIT_CN ? 0 : (e)) 105#define net_xmit_errno(e) ((e) != NET_XMIT_CN ? -ENOBUFS : 0) 106 107/* Driver transmit return codes */ 108#define NETDEV_TX_MASK 0xf0 109 110enumnetdev_tx { 111 __NETDEV_TX_MIN = INT_MIN, /* make sure enum is signed */ 112 NETDEV_TX_OK = 0x00, /* driver took care of packet */ 113 NETDEV_TX_BUSY = 0x10, /* driver tx path was busy*/ 114 NETDEV_TX_LOCKED = 0x20, /* driver tx lock was already taken */ 115 }; 116typedefenumnetdev_txnetdev_tx_t;
How __qdisc_run works
__qdisc_run must be embraced by qdisc_run_begin and qdisc_run_end. Before __qdisc_run, set flag __QDISC___STATE_RUNNING. after run, remove it. The flag and two functions ensure a qdisc will run only on a CPU at the smae time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
194void __qdisc_run(struct Qdisc *q) 195 { 196int quota = weight_p; 197 198while (qdisc_restart(q)) { 199/* 200 * Ordered by possible occurrence: Postpone processing if 201 * 1. we've exceeded packet quota 202 * 2. another process needs the CPU; 203 */ 204if (--quota <= 0 || need_resched()) { 205 __netif_schedule(q); 206break; 207 } 208 } 209 210 qdisc_run_end(q); 211 }
weight_p is the max count of packets sent in a qdisc during a TX softirq.
If the qdisc has little packets, and they will be sent out in the while loop. Else, the qdisc will be set state as __QDISC_STATE_SCHED, and the qdisc will linked to output_queue of current cpu’s __get_cpu_var(softnet_data), the TX softirq will be triggered to sent remain packet in the qdisc.
156/* 157 * NOTE: Called under qdisc_lock(q) with locally disabled BH. 158 * 159 * __QDISC_STATE_RUNNING guarantees only one CPU can process 160 * this qdisc at a time. qdisc_lock(q) serializes queue accesses for 161 * this queue. 162 * 163 * netif_tx_lock serializes accesses to device driver. 164 * 165 * qdisc_lock(q) and netif_tx_lock are mutually exclusive, 166 * if one is grabbed, another must be free. 167 * 168 * Note, that this procedure can be called by a watchdog timer 169 * 170 * Returns to the caller: 171 * 0 - queue is empty or throttled. 172 * >0 - queue is not empty. 173 * 174 */ 175staticinlineintqdisc_restart(struct Qdisc *q) 176 { 177structnetdev_queue *txq; 178structnet_device *dev; 179spinlock_t *root_lock; 180structsk_buff *skb; 181 182/* Dequeue packet */ 183 skb = dequeue_skb(q); 184if (unlikely(!skb)) 185return0; 186 WARN_ON_ONCE(skb_dst_is_noref(skb)); 187 root_lock = qdisc_lock(q); 188 dev = qdisc_dev(q); 189 txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb)); 190 191return sch_direct_xmit(skb, q, dev, txq, root_lock); 192 }
109/* 110 * Transmit one skb, and handle the return status as required. Holding the 111 * __QDISC_STATE_RUNNING bit guarantees that only one CPU can execute this 112 * function. 113 * 114 * Returns to the caller: 115 * 0 - queue is empty or throttled. 116 * >0 - queue is not empty. 117 */ 118intsch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, 119struct net_device *dev, struct netdev_queue *txq, 120spinlock_t *root_lock) 121 { 122int ret = NETDEV_TX_BUSY; 123 124/* And release qdisc */ 125 spin_unlock(root_lock); 126 127 HARD_TX_LOCK(dev, txq, smp_processor_id()); 128if (!netif_xmit_frozen_or_stopped(txq)) 129 ret = dev_hard_start_xmit(skb, dev, txq); 130 131 HARD_TX_UNLOCK(dev, txq); 132 133 spin_lock(root_lock); 134 135if (dev_xmit_complete(ret)) { 136/* Driver sent out skb successfully or skb was consumed */ 137 ret = qdisc_qlen(q); 138 } elseif (ret == NETDEV_TX_LOCKED) { 139/* Driver try lock failed */ 140 ret = handle_dev_cpu_collision(skb, txq, q); 141 } else { 142/* Driver returned NETDEV_TX_BUSY - requeue skb */ 143if (unlikely(ret != NETDEV_TX_BUSY)) 144 net_warn_ratelimited("BUG %s code %d qlen %d\n", 145 dev->name, ret, q->q.qlen); 146 147 ret = dev_requeue_skb(skb, q); 148 } 149 150if (ret && netif_xmit_frozen_or_stopped(txq)) 151 ret = 0; 152 153return ret; 154 }
skb->queue_mapping
In multi-queue nic driver, it is used to indicate which queue is used to xmit packet. It is set by skb_set_queue_mapping in netdev_pick_tx, __dev_queue_xmit.
busylock of struct Qdisc
As we said, Qdisc uses __QDISC___STATE_RUNNING to ensure, for a same qdisc, ONLY ONE cpu xmit packet at the same time. How to manage the other cpus?
busylock of struct Qdisc is used for this. For a same qdisc, the first CPU, set the __QDISC___STATE_RUNNING. the second CPU, grab the spinlock busylock of struct Qdisc for the third or more CPU, wait on spinlock busylock of struct Qdisc.
148structsk_buff_head { 149/* These two members must be first. */ 150structsk_buff *next; 151structsk_buff *prev; 152 153 __u32 qlen; 154spinlock_t lock; 155 };
NETIF_F_LLTX
HARD_TX_LOCK and HARD_TX_UNLOCK will embrance the driver’s ndo_start_xmit.