Ideas to work on during the day
1. We could try to have p4c and P4Runtime use as latest a protobuf version as possible. Users may have apps on their machine which use higher version of gRPC/protobuf which messes up p4c/P4Runtime.
2. Flexsai code has issues after porting from Python 2.7 to 3. I have filed an Issue but no one has worked on the issue. See https://github.com/opencomputeproject/SAI/issues/896
3. Complete bmv2 PSA implementation. Before March 1st, we should document with an Issue for what is incomplete with bmv2 PSA.
4. We should document bmv2 backend code as well.
5. By March 1st, nested struct support in p4c p4runtime generation and PI server should be complete. We could develop more tests in p4c and PI.
6. Support emit for struct in bmv2 backend. See https://github.com/p4lang/p4c/issues/1659
7. Few days before March 1st, we scan the p4c and PI Issues list and collect some that can be fixed in a day.
8. Implement different IPv6 switching types in data plane as defined in this IETF draft:
<draft:https://tools.ietf.org/…/draft-baker-openstack-ipv6-model-02>. Use P4Runtime to configure the data-plane to use one of many switching types.
The goal of automation is to use open-source software.
1. The network switches use ZTP (Zero Touch Provisioning). The management port of switch uses DHCP to get IP address and a DHCP option provides tftp address to get any configuration from tftp server.
2. Cisco, Juniper, and Arista switches support gRPC. Use gRPC to get telemetry from switches to debug switch issues. SONiC switches also support gRPC and gNMI to get telemetry data. Note, no telemetry can catch microbursts.
3. Use ONOS, which has web UI to get network topology (from LLDP), learn devices, learn BGP, supports clustering, HA, target configuration update, etc. For northbound interface, ONOS supports REST API, gRPC, and CLI. ONOS uses OpenConfig and YANG models. gRPC with ONOS allows gNMI and gNOI for network management and network operations. Routers cannot be learnt from LLDP. LLDEP-MED has a capabilities knob that discovers routers.
4. gNMI uses Capabilites gRPC which can be extended to support any node (e.g., server, storage). Capabilities gRPC is very extensive (https://github.com/openconfig/public/tree/master/release/models). YANG allows new capabilities to be defined. gNOI can be extended for new network operations. An IETF draft is an excellent doc to learn gNMI from: https://tools.ietf.org/html/draft-openconfig-rtgwg-gnmi-spec-01. No IETF draft exists for gNOI.
5. gNOI supports upgrading firmware of a node. See the SetPackage() API with gNOI.
6. If any Puppet, Chef, CFEngine, or Ansible is planned, use Ansible because only Ansible uses an agentless architecture. Ansible supports managing servers. It can be extended to manage networking nodes. Ansible takes care of firmware versions and upgrading firmware of devices. Any network automation and management also requires integration with Ansible.
7. It is desirable to manage servers and network switches using same tools. Why not use gRPC? Strive to manage storage with gRPC as well.
8. Most often, data centers issues occur during a network management operation (70% of Google's failures, see slide 21 at https://tinyurl.com/y8fssfdx). For long-term, consider using Intent to manage the network.
9. Monitoring the WAN link is important - monitor link bandwidth and routing flaps.
10. Simplify the network - use IPv6 internally with NAT64 at network edge. Public-domain NAT64 (stateless and stateful) software exists.
Please let us know if you have an issue not covered in this blog. I'd be happy to get back. thanks.
Copyright © 2017-2019 MNK Consulting. All rights reserved.