Troubleshooting Switches

If the switch is showing disconnected in the UI but is online locally and reachable then we can try the following steps to troubleshoot further. We will need console access to the switch or SSH access to the switch to check the following:

1. Make sure the EX switch is on the supported version

Minimum JUNOS OS firmware versions supported for ZTP:

    • EX2300, EX3400: 18.2R3-S2
    • EX4300: 18.4R2-S2
    • EX4600, EX4650: 20.4R3

 

2. Ensure the switch has a valid IP address

Run “ show interfaces terse”. You should see the irb.0 interface having an IP address. You might see multiple irb interfaces depending on the switch model (or in the case of VC). At least one irb interface needs to have a valid IP address.

The switch can also connect using management IP which will be seen on me0 interface. Ensure either irb0 or me0 interface has a valid IP and is administratively UP. (Link status also needs to be UP)

 

3. Ensure that the device can reach the gateway

 

4. Ensure the switch can reach the internet

mist@OFFICE_GF_SWITCH> ping 8.8.8.8 
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=117 time=22.996 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=24.747 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=16.528 ms

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 16.528/21.424/24.747/3.535 ms

 

5. Check if the switch can resolve oc-term.mistsys.net

njohnny@NJ_EX_2300_C_Test> ping oc-term.mistsys.net 
PING ab847c3d0fcd311e9b3ae02d80612151-659eb20beaaa3ea3.elb.us-west-1.amazonaws.com (13.56.90.212): 56 data bytes

If it’s unable to resolve then check for DNS servers configured on the switch:

mist@OFFICE_GF_SWITCH> show configuration | display set | grep name-server 
set system name-server 202.56.230.2
set system name-server 202.56.230.7
set system name-server 8.8.8.8

configure name server (set system name-server 8.8.8.8)

 

6. Ensure firewall ports are open (tcp port 2200 for oc-term.mistsys.net)

Please check your cloud environment to see which ports and hosts to enable

njohnny@NJ_EX_2300_C_Test> show system connections | grep 2200

tcp4 0 0 192.168.3.24.64647 13.56.90.212.2200 ESTABLISHED

 

7. Check if the switch system time is correct

njohnny@NJ_EX_2300_C_Test> show system uptime 
fpc0:
--------------------------------------------------------------------------
Current time: 2020-09-01 21:49:05 UTC
Time Source: LOCAL CLOCK 
System booted: 2020-08-27 06:57:04 UTC (5d 14:52 ago)
Protocols started: 2020-08-27 07:01:35 UTC (5d 14:47 ago)
Last configured: 2020-09-01 17:21:59 UTC (04:27:06 ago) by mist
9:49PM up 5 days, 14:52, 2 users, load averages: 0.79, 0.65, 0.58

 

8. Check if ‘device-id’ is of the format ‘<org_id>.<mac_addr> in the CLI command below

njohnny@NJ_EX_2300_C_Test# show system services outbound-ssh 
traceoptions {
file outbound-ssh.log size 64k files 5;
flag all;
}
client mist {
device-id ca01ea19-afde-49a4-ad33-2d9902f14a7e.e8a2453e672e;
secret "$9$L7i7-wgoJUDkg49Ap0IRrevW-VYgoDHqWLGDkqQzRhcreWLX-Vs2XxGDHkPfn/Cp0IcSeMLxn/LxN-ws5Qz6tuRhSv8Xrl87dVY2TzF/uOEcyKWLleUjikPfIEhSrvxNdbYgRhK8x7Vbk.mf5F9CuOBEtp0IcSMWoJZjmfFn/CA05TIEhSeK4aJUjqP5Q9tu4an/CtOB7-dboJZUjHmfaJn/ApREevW8X-YgoiqmxNb2gaUD69Cp1RSyKMLxCtORSrvM7-VboJDjqPTzNdmfzF/9vW8LdbY2aZGisY4ZDif5z3690BylKWX7KvZUHkTQlKvW-VJGDiqmGU/CtuEhKM87wYaJDkqfoaQFn6At1RhrM8xNd"; ## SECRET-DATA
keep-alive {
retry 3;
timeout 5;
}
services netconf;
oc-term.mistsys.net {
port 2200;
retry 1000;
timeout 60;
}
}

You can also check log messages on the switch.

njohnny@NJ_EX_2300_C_Test> show log messages | last 20 
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/6 status 27
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/7 status 27
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/8 status 27
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/9 status 27
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/10 status 27
Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/11 status 27
Sep 1 21:55:24 NJ_EX_2300_C_Test mgd[93246]: UI_DBASE_LOGIN_EVENT: User 'njohnny' entering configuration mode
Sep 1 21:57:18 NJ_EX_2300_C_Test mgd[93246]: UI_DBASE_LOGOUT_EVENT: User 'njohnny' exiting configuration mode
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/0 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/1 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/2 status 1
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/3 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/4 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/5 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/6 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/7 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/8 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/9 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/10 status 27
Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/11 status 27

 

9. If you are adding the switch for the first time

Try to release the Switch from the UI (delete the SW entry) and then try to claim it or readopt the Switch.

  1. Delete the present Mist configuration from the switch.

  2. Again claim via Claim Code or adopt via CLI command.

  3. Verify system service outbound-ssh using the below command “show system services outbound-ssh”

and “show system connections | grep 2200”

If switch is still stuck in disconnected state and:

  • sessions are stuck in FIN_WAIT

  • Switch is able to resolve DNS

  • Internet is reachable

Check for MTU issues on the nodes. Easiest way to validate this is by initiating a ping towards any public server [say 8.8.8.8]

Else, if you have the uplink pcaps from switch, a failing transaction with MTU issue would look something like this:

On a closer look, we could see that the packets with size 1514 are getting retried.

We could do a ping test from the switch in question as follows:

mist@ACC2-A6-IDF1-IAD10> ping size 1450 8.8.8.8   
PING 8.8.8.8 (8.8.8.8): 1450 data bytes
76 bytes from 8.8.8.8: icmp_seq=0 ttl=59 time=12.444 ms
— 8.8.8.8 ping statistics —
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 12.318/12.381/12.444/0.063 ms

Failed case:

mist@ACC2-A6-IDF1-IAD10> ping size 1480 8.8.8.8   
PING 8.8.8.8 (8.8.8.8): 1480 data bytes

— 8.8.8.8 ping statistics —
4 packets transmitted, 0 packets received, 100% packet loss

Based on the byte size at which packets are getting timed out, MTU could be adjusted on the uplink accordingly.

10. Deactivate the outbound SSH and reactivate it

Deactivate system service outbound-ssh using the below command

deactivate system services outbound-ssh client mist

Commit

Activate outbound-ssh using the below command:

activate system services outbound-ssh client mist