2014-02-10

route

dst ops

Call trace

forward a packet.

> ip_rcv_finish
> > ip_route_input_noref
> > > ip_route_input_slow  
> > > > fib_lookup 
> > > > ip_mkroute_input
> > dst_input(skb)

> > > > ip_mkroute_input
> > > > > __mkroute_input
> > > > > > rth = rt_dst_alloc(...)
> > > > > > skb_dst_set(skb, &rth->dst);

2014-02-08

netdev

Qdisc running flag

Summary

In struct Qdisc, there are two similar fileds.
running flag is stored in __state of struct Qdisc, NOT state.
Every time, when we send a packet from qdisc, the running flag is
set by qdisc_run_begin, and after that, it is removed by qdisc_run_end.

1
2
3

84         unsigned long           state;
...
87         unsigned int            __state;

todo

why need busylock?

2014-02-08

netdev

how to xmit a packet with Qdisc

summary

We think it as a ideal and simple case:

Call Trace:

> dev_queue_xmit
> >  __dev_queue_xmit(skb, NULL);
> > > rcu_read_lock_bh();
> > > txq = netdev_pick_tx(dev, skb, accel_priv);
> > > q = rcu_dereference_bh(txq->qdisc);
> > > rc = __dev_xmit_skb(skb, q, dev, txq);
> > > > skb_dst_force(skb);
> > > > q->enqueue(skb, q);
> > > > qdisc_run_begin(q)
> > > >  __qdisc_run(q);
> > > > > while (qdisc_restart(q))
> > > > > > __netif_schedule
> > > > > qdisc_run_end(q)
> > > rcu_read_unlock_bh();
> > > return rc;

2014-01-28

netdev

how to create dev qdisc

Summary

Part 1: Register multi queue net device.

In this part, only the framework is prepared for qdisc,
and the noop_qdisc is set as default.

prepare `netdev_queue`s.

for example: intel igb hardware has 8 hardware tx queue,
and nic driver create 8 corresponding struct netdev_queue
in the _tx of struct net_device.

prepare `mq_qdisc`

The mq_qdisc is attached to the corresponding device.
In mq_qdisc private field, a default qdisc will be
create for each NIC’s hardware queue.
This is done in mq_init.
The default qdisc is pfifo_fast_ops.

attach `mq_qdisc` to `netdev_queue`.

In mq_attach, these qdiscs are attatched to corresponding
struct netdev_queue.

Part 2: Active a net device with right qdiscs

Here only trace with the case mq_qdisc.
When dev is up, dev_open is called, which will call dev_activate.

2014-01-28

netdev

qdisc study part1: qdisc_base

`Qdisc_ops` is the core of a Qdisc.

All kinds of the Qdisc_ops are linked in a list by qdisc_base.
The key item of different Qdisc_ops is id[IFNAMSIZ].

Note: the list is a Singly-linked list, not a common list of kernel.

158 struct Qdisc_ops {
159         struct Qdisc_ops        *next;
160         const struct Qdisc_class_ops    *cl_ops;
161         char                    id[IFNAMSIZ];
162         int                     priv_size;
163
164         int                     (*enqueue)(struct sk_buff *, struct Qdisc *);
165         struct sk_buff *        (*dequeue)(struct Qdisc *);
166         struct sk_buff *        (*peek)(struct Qdisc *);
167         unsigned int            (*drop)(struct Qdisc *);
168
169         int                     (*init)(struct Qdisc *, struct nlattr *arg);
170         void                    (*reset)(struct Qdisc *);
171         void                    (*destroy)(struct Qdisc *);
172         int                     (*change)(struct Qdisc *, struct nlattr *arg);
173         void                    (*attach)(struct Qdisc *);
174
175         int                     (*dump)(struct Qdisc *, struct sk_buff *);
176         int                     (*dump_stats)(struct Qdisc *, struct gnet_dump *);
177
178         struct module           *owner;
179 };

`qdisc_base`

1
2
3

134 /* The list of all installed queueing disciplines. */
135
136 static struct Qdisc_ops *qdisc_base;

2014-01-27

socket

how to remember ip/tcp/udp header

IP HEADER

以前找工作时，常被问到IP头都有哪些字段？
现在觉得真的理解了记起来没那么难。

上路时总要记得终点和起点(src/dst ip),
要倒几次车，大体也知道（ttl).

路有大路有小路，有高速路和土路。
大路到小路要分片，小路到大路可能重组。
上高速路需要通行证（qos/tos).

路上有警察，查你是不是非法(csum, header len, ip option)，
查你装的什么货(protocol)？

2014-01-14

socket

how does tcp server accept a new connection request

summary

Two packets will be proessed by tcp server side:

SYN pakcet: For the first packet(syn) of handshake, it first lookup the listen socket,
and create a req socket as temporary. Send SYN+ACK packet to client.
ACK packet: For the third packet(ack) of handshake, it will lookup the req socket created
in previous steps.

SYN 报文calltrace

内核版本 v6.14

=> tcp_v4_rcv(struct sk_buff *skb)
=> => sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
=> => tcp_v4_do_rcv(sk, skb);
=> => => struct sock *nsk = tcp_v4_hnd_req(sk, skb)
=> => => tcp_rcv_state_process
=> => => => if (icsk->icsk_af_ops->conn_request(sk, skb) < 0) 相当 tcp_v4_conn_request
=> => => => tcp_v4_conn_request(&tcp_request_sock_ops, &tcp_request_sock_ipv4_ops, sk, skb);
=> => => => => inet_reqsk_alloc
=> => => => => => struct request_sock *req = reqsk_alloc(ops, sk_listener, attach_listener);
=> => => => => => ireq->ireq_state = TCP_NEW_SYN_RECV; //#define ireq_state req.__req_common.skc_state
=> => => => => tcp_openreq_init(req, &tmp_opt, skb);
=> => => => => inet_csk_reqsk_queue_hash_add
=> => => => => => reqsk_queue_hash_req
=> => => => => => => inet_ehash_insert  //将 req sock放到 establish hash链表里
=> => => => => => inet_csk_reqsk_queue_added
=> => => => => => => reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue); //半链接队列，真实名字叫 accept。 功能也只有计数能力了，没有队列了。
=> => => => => => => => atomic_inc(&queue->young);
=> => => => => => => => atomic_inc(&queue->qlen);
=> => => => => af_ops->send_synack  //发送synack报文
=> => => => return // tcp_v4_conn_request
=> => => return; // tcp_rcv_state_process

注意

tcp_request_sock_ops：是一个结构体的名字，同时又是一个变量的名字。
icsk_accept_queue: 在结构体struct inet_connection_sock里的这个队列并不在保存半链接队列的 req socket，而是计数。

2013-11-25

socket

How does IPV4/6 process input tcp/udp packet

For the packet to localhost, ip_local_deliver_finish will be the
last funciton called by network layer.

In ip_local_deliver_finish, it will be process the protocol
hander of a array element of inet_protos according the protocol value
in IPv4 header.

IPv6 is very similar vs IPv4 except the name is a bit different.

2013-11-22

socket

the inheriting of linux sock type

Data Structure

Every xsock has parent sock as its first filed.

2013-11-22

socket

inetsw table

Data struct.

123 /* The inetsw table contains everything that inet_create needs to
124  * build a new socket.
125  */
126 static struct list_head inetsw[SOCK_MAX];
127 static DEFINE_SPINLOCK(inetsw_lock);

Call trace

Summary

todo

summary

Call Trace:

Summary

Part 1: Register multi queue net device.

prepare netdev_queues.

prepare mq_qdisc

attach mq_qdisc to netdev_queue.

Part 2: Active a net device with right qdiscs

Qdisc_ops is the core of a Qdisc.

qdisc_base

IP HEADER

summary

SYN 报文calltrace

注意

Data Structure

Data struct.

prepare `netdev_queue`s.

prepare `mq_qdisc`

attach `mq_qdisc` to `netdev_queue`.

`Qdisc_ops` is the core of a Qdisc.

`qdisc_base`