k8s calico报错Calico node ‘binary-k8s-master1’ is already using the IPv4 address 172.18.0.1
另外只要是calico组件一直处于running 0/1的状态都可以参考下面的解决办法
1.问题描述
将pinpoint微服务链路监控以docker方式部署在了k8s集群中的node节点上,刚开始没有问题,过了一天后,发现calico组件全部启动失败,并且集群中的所有微服务无法提供服务
calico报错如下
# kubectl logs -f calico-node-p6hsk -n kube-system
2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 299: Early log level set to info
2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 315: Using NODENAME environment for node name
2021-11-26 07:28:04.920 [INFO][8] startup/startup.go 327: Determined node name: binary-k8s-node2
2021-11-26 07:28:04.925 [INFO][8] startup/startup.go 359: Checking datastore connection
2021-11-26 07:28:04.976 [INFO][8] startup/startup.go 383: Datastore connection verified
2021-11-26 07:28:04.977 [INFO][8] startup/startup.go 104: Datastore is ready
2021-11-26 07:28:05.086 [INFO][8] startup/startup.go 425: Initialize BGP data
2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 664: Using autodetected IPv4 address on interface br-a32444aa3aae: 172.18.0.1/16
2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 495: Node IPv4 changed, will check for conflicts
2021-11-26 07:28:05.105 [WARNING][8] startup/startup.go 1010: Calico node 'binary-k8s-master1' is already using the IPv4 address 172.18.0.1.
2021-11-26 07:28:05.106 [INFO][8] startup/startup.go 263: Clearing out-of-date IPv4 address from this node IP="172.18.0.1/16"
2021-11-26 07:28:05.139 [WARNING][8] startup/startup.go 1214: Terminating
也没有对K8S集群做过任何特殊操作,集群中calico就异常了,一直无法启动,重启集群也不好使。
2.问题解决
我们可以仔细观察日志中的报错,看到如下的一句话
2021-11-26 07:28:05.087 [INFO][8] startup/startup.go 664: Using autodetected IPv4 address on interface br-a32444aa3aae: 172.18.0.1/16
br-a32444aa3aae这是个什么鬼,看报错日志的意思也就是说br-a32444aa3aae这个网卡上有一个172.18.0.1IP导致和calico冲突了。
我们去node2主机上查一下这个网卡,果不其然确实有这个网卡,并且也有这个IP。
# ifconfig
br-a32444aa3aae: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.18.0.1 netmask 255.255.0.0 broadcast 172.19.255.255
inet6 fe80::42:8ff:feb9:2492 prefixlen 64 scopeid 0x20<link>
ether 02:42:08:b9:24:92 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
这个网卡像是docker生成的,我们查一下docker的网络
# docker network ls
NETWORK ID NAME DRIVER SCOPE
55efa42d705d bridge bridge local
aba466ac1f9c host host local
89494ad04935 none null local
a32444aa3aae pinpoint-docker-185_pinpoint bridge local
的确, br-a32444aa3aae与a32444aa3aae网络是相对应的,也就是我们的pinpoint,将这个网络删除
# docker network rm a32444aa3aae
a32444aa3aae
删掉网络之后,我们在观察calico网络已经成功启动
# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-97769f7c7-dsdrk 1/1 Running 1 38m
calico-node-jkl6q 1/1 Running 0 8m38s
calico-node-pgstp 1/1 Running 0 8m39s
calico-node-vssbk 1/1 Running 0 8m39s
coredns-6cc56c94bd-m7pzr 1/1 Running 1 30m
3.总结
使用docker或者docker-compose部署的程序最好不要放在K8S集群,docker-compose部署的服务都有自己的docker网络,会和calico网络产生冲突。