Tuesday, December 14, 2010

ip route

笔记
route是什么意思?
这个词的字面意思一直不知道是什么意思。 其实这不是一个汉语词汇。汉语词汇里只有一个“路由单” ,为一名词。 路由(route)有两种意思:
1. 选择某路线
2. 与"路由单"的意思一致: 指旅途所经历的地名单

选择路线的依据是"目的地".
In the common case, route selection is based completely on the destination address. Conventional (as opposed to policy-based) IP networking relies on only the destination address to select a route for a packet.

但是随着发展, 仅仅是destination方式的route方式不能满足要求:
With the prevalence of low cost bandwidth, easily configured VPN tunnels, and increasing reliance on networks, the technique of selecting a route based solely on the destination IP address range no longer suffices for all situations.

linux对应这种发展的具体落实:
Since kernel 2.2, linux has supported policy based routing through the use of multiple routing tables and the routing policy database (RPDB). Together, they allow a network administrator to configure a machine select different routing tables and routes based on a number of criteria.

意思大概是如下两件事物:
1. linux支持多routing tables.  routing policy database (RPDB)
2. 每张表有独立的规则. policy based routing

平时使用的路由都是由destination成唯一条件(比如使用route命令打印的結果). 那么policy based routing有什么重要呢?
In fact, advanced routing could more accurately be called policy-based networking.

下面的一段话, 描述了linux在路由数据包时, policy based routing使用的多种实现方法
Selectors available for use in policy-based routing are attributes of a packet passing through the linux routing code. The source address of a packet, the ToS flags, an fwmark (a mark carried through the kernel in the data structure representing the packet), and the interface name on which the packet was received are attributes which can be used as selectors. By selecting a routing table based on packet attributes, an administrator can have granular control over the network path of any packet.
selector确定使用那张routing table.

使用人类文字描述Linux选择线路不是很容易理解, 如下一段伪代码比较好:
if packet.routeCacheLookupKey in routeCache :
    route = routeCache[ packet.routeCacheLookupKey ]
else
    for rule in rpdb :
        if packet.rpdbLookupKey in rule : (rule为下表的RPDB对象)
            routeTable = rule[ lookupTable ] (routeTable为下表的route table对象)
            if packet.routeLookupKey in routeTable :
                route = route_table[ packet.routeLookup_key ]

把rpdb为routing table, 规则(rule)都在DB中, 每条rule有不同的属性(这里的属性包括上面提到的attributes).

伪代码中的LookupKey是代表下表中具体的一条属性. 所以, 其实上面的伪代码是很N多if语句的.

* 斜体字的属性是可选的. 如果存在就判断, 不存在不判断.

从上面的可以知道, route table起到:
1. 组织rule的作用
2. 同类的rule会拥有一组属性.

从上面的表中可以知道, 每个packet的destination和source是必定被用于路由, 但是不唯一确定条件.

linux system administrator查看上面的三种数据的方法:
1. route cache 表:  ip route show cache
2. 每张RPDB表: ip rule list table 表名
3. 列出全部route table:  ip rule show

Sunday, December 12, 2010

ethernet

开始

以太网层本时很少关注. 最近一次关注是理解LVS时. 最近在看<<Guide to IP Layer Network
Administration with Linux>>, 做做笔记, 随便动动手. 加深记忆.

被操作的机器上只有网关的物理地址:
$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
10.20.129.1              ether   00:0F:E2:D3:BE:B8   C                     eth0

进行如下动作:
$ ping 10.20.129.32

把ping动作发出的包抓下来

$ sudo tcpdump -ent -i eth0 arp or icmp

....(截掉).....
00:23:ae:93:d9:26 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 10.20.129.32 tell 10.20.129.19
00:1e:4f:ad:41:58 > 00:23:ae:93:d9:26, ethertype ARP (0x0806), length 60: arp reply 10.20.129.32 is-at 00:1e:4f:ad:41:58
00:23:ae:93:d9:26 > 00:1e:4f:ad:41:58, ethertype IPv4 (0x0800), length 98: 10.20.129.19 > 10.20.129.32: ICMP echo request, id 26119, seq 1, length 64
00:1e:4f:ad:41:58 > 00:23:ae:93:d9:26, ethertype IPv4 (0x0800), length 98: 10.20.129.32 > 10.20.129.19: ICMP echo reply, id 26119, seq 1, length 64
....(截掉).....

ICMP包在ethernet层之上, 需要使用ethernet发数据, 需要物理地址. 为了得到物理地址使用到ARP协议.

ARP过程与如下命令一致:  $ sudo arping -I eth0 10.20.129.32这一条命令表示向网段内查询某IP对应的MAC地址.
查看ARP表:
$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
10.20.129.1              ether   00:0F:E2:D3:BE:B8   C                     eth0
10.20.129.32             ether   00:1E:4F:AD:41:58   C                     eth0
增加了一个记录

arping命令 -A 参数: ARP announcement, 也称为gratuitous ARP

$ sudo arping -A -c 3 -I eth0 10.20.129.19
tcpdump的抓包结果:
00:23:ae:93:d9:26 > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.20.129.19 is-at 00:23:ae:93:d9:26
00:23:ae:93:d9:26 > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.20.129.19 is-at 00:23:ae:93:d9:26
00:23:ae:93:d9:26 > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.20.129.19 is-at 00:23:ae:93:d9:26

从上面的信息看出, -A是向整个网段通知自己的IP. 默认情况下, linux 不会接受这样的包.
由arp_accept选项控制, 如下文档:

arp_accept - BOOLEAN
    Define behavior for gratuitous ARP frames who's IP is not
    already present in the ARP table:
    0 - don't create new entries in the ARP table
    1 - create new entries in the ARP table

如果看知道 gratuitous ARP 包的具体用法, 可以移步到: http://wiki.wireshark.org/Gratuitous_ARP

arping命令 -D 参数: Duplicate address detection mode (DAD)

这个参数相当有用: 用于排除网段中有IP冲突. 来个实例:

root@jessinio-laptop:~# ifconfig wlan0 |head -n 2
wlan0     Link encap:Ethernet  HWaddr 00:16:cf:68:5b:a7  
          inet addr:192.168.0.106  Bcast:192.168.0.255  Mask:255.255.255.0

root@jessinio-laptop:~# arping -D -I wlan0 192.168.0.106
ARPING 192.168.0.106 from 0.0.0.0 wlan0
Unicast reply from 192.168.0.106 [00:18:41:FE:26:5F]  90.390ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)

可以看出, 192.168.0.106 被两台机器使用, 一台是本志的00:16:cf:68:5b:a7 , 另一台是00:18:41:FE:26:5F.

抓包信息:

00:16:cf:68:5b:a7 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.106 (ff:ff:ff:ff:ff:ff) tell 0.0.0.0, length 28
00:18:41:fe:26:5f > 00:16:cf:68:5b:a7, ethertype ARP (0x0806), length 42: Reply 192.168.0.106 is-at 00:18:41:fe:26:5f, length 28

结束

以一个问题为结束: 使用ICMP协议能否得知网段中有其它机器使用自己的IP呢? 比如, ping自己的IP.


答案是不可以的. 因为ICMP包基本没有发出来. 回流了. 例如:


$ ping 10.20.129.19
产生的数据包不会流过ethernet卡, 从route表就可以知道:

$ ip route list table local
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
broadcast 10.20.129.0 dev eth0  proto kernel  scope link  src 10.20.129.19
local 10.20.129.19 dev eth0  proto kernel  scope host  src 10.20.129.19
broadcast 10.20.129.127 dev eth0  proto kernel  scope link  src 10.20.129.19
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1