Common Issues¶
This page covers frequently encountered problems and their solutions.
SSH Connection Issues¶
Can't SSH to Node¶
Symptom: ssh root@10.11.12.1 times out or refuses connection.
Solutions:
- Check network connectivity:
- Verify you're on the right network:
- Check SSH key is loaded:
- Try with password (if key auth not yet configured):
- Check firewall on your machine:
SSH Key Rejected¶
Symptom: Permission denied (publickey).
Solutions:
- Verify key is deployed:
ssh root@10.11.12.1 "cat /etc/dropbear/authorized_keys"
# Or for OpenSSH:
ssh root@10.11.12.1 "cat ~/.ssh/authorized_keys"
- Redeploy key:
- Check key permissions (on your machine):
Connection Drops During Deployment¶
Symptom: SSH disconnects mid-playbook, deployment incomplete.
Solutions:
- Run playbook with SKIP_REBOOT:
- After network config changes, reconnect to new IP:
- Use console access for initial setup:
- Connect serial cable
- 115200 baud, 8N1
- Configure basic networking first
Mesh Not Forming¶
Nodes Don't See Each Other¶
Symptom: batctl n shows no neighbors.
Solutions:
- Check physical connections:
- Verify VLAN interfaces exist:
- Check batman interfaces:
- Verify batman is running:
- Check for MTU issues:
Wireless Mesh Not Working¶
Symptom: Wired mesh works but wireless backup doesn't.
Solutions:
- Check 802.11s interface:
- Verify mesh is in batman:
- Check wireless is on correct channel:
- Verify mesh ID matches:
- Reload wireless:
Poor Mesh Quality (Low TQ)¶
Symptom: batctl o shows TQ values below 200.
Solutions:
- Check for interference (wireless):
- Check cable quality (wired):
- Verify VLAN tagging:
WiFi Issues¶
5GHz AP Not Visible¶
Symptom: Can't see the client SSID.
Solutions:
- Check radio is enabled:
- Check AP interface:
- Verify channel is valid for your region:
- Restart wireless:
Clients Can't Connect¶
Symptom: SSID visible but authentication fails.
Solutions:
- Verify password (on node):
- Check encryption matches:
- Check hostapd is running:
- Review hostapd logs:
Clients Not Getting DHCP¶
Symptom: Connected but no IP address.
Solutions:
- Check dnsmasq is running:
- Verify DHCP pool:
- Check bridge configuration:
- Restart DHCP server:
- Check DHCP leases:
VLAN Issues¶
VLAN Interfaces Missing¶
Symptom: ip link show doesn't show VLAN interfaces.
Solutions:
- Check 8021q module:
- Verify network config:
- Recreate VLAN interfaces:
VLAN Tagging Mismatch¶
Symptom: Traffic not reaching destination, works without VLANs.
Solutions:
- Verify switch VLAN config matches node config
- Check PVID settings on switch
- Use tcpdump to verify tagging:
IoT Devices Can Reach Main Network¶
Symptom: VLAN isolation not working.
Solutions:
- Check firewall zones:
- Verify forward policy:
- Check inter-zone rules:
Gateway Issues¶
All Traffic Goes Through One Node¶
Symptom: Gateway list shows only one gateway selected.
Solutions:
- Check gateway mode on all nodes:
- Verify gateway bandwidth configured:
- Check if WAN is up on all nodes:
Internet Not Working¶
Symptom: Can ping mesh IPs but not internet.
Solutions:
- Check default route:
- Verify NAT rules:
- Check WAN interface:
- Test DNS:
Management Network Issues¶
Intermittent Connectivity to Nodes¶
Symptom: Pings to node management IPs (10.11.10.x) sometimes fail, then work again. SSH sessions drop randomly.
Cause: In multi-switch topologies, short ARP cache times (default 30-60 seconds) can cause race conditions during MAC/ARP relearning, leading to brief connectivity outages.
Solution: Increase ARP cache times on all mesh nodes:
# Check current settings
cat /proc/sys/net/ipv4/neigh/br-mgmt/gc_stale_time
cat /proc/sys/net/ipv4/neigh/br-mgmt/base_reachable_time_ms
# Apply fix (if not deployed via Ansible)
sysctl -w net.ipv4.neigh.br-mgmt.gc_stale_time=300
sysctl -w net.ipv4.neigh.br-mgmt.base_reachable_time_ms=120000
# Make persistent
cat >> /etc/sysctl.conf << 'EOF'
# ARP cache settings for management network (br-mgmt)
net.ipv4.neigh.br-mgmt.gc_stale_time = 300
net.ipv4.neigh.br-mgmt.base_reachable_time_ms = 120000
EOF
Note: This fix is automatically applied by Ansible during deployment (see group_vars/all.yml for configuration).
Verification:
# Test all nodes from management network
for ip in 10.11.10.1 10.11.10.2 10.11.10.3; do
ping -c 10 $ip
done
# All should show 0% packet loss
Can't Reach Node from Different Switch¶
Symptom: Devices on Switch B can't reach Node 1 (connected to Switch A), but can reach other nodes.
Solutions:
- Check switch VLAN 10 configuration:
- VLAN 10 must be properly trunked between switches
-
Management traffic uses VLAN 10
-
Verify BLA (Bridge Loop Avoidance) is working:
-
Check ARP cache settings (see above)
-
Verify the path:
Performance Issues¶
Slow Network Speeds¶
Solutions:
- Check mesh TQ values:
- Test direct link speed:
# On one node:
nc -l -p 5001 > /dev/null
# On another:
dd if=/dev/zero bs=1M count=100 | nc 10.11.12.1 5001
- Check for CPU overload:
- Verify Gigabit negotiation:
High Latency¶
Solutions:
- Check hop count:
- Look for routing loops:
- Check for interference (wireless):
Getting More Help¶
If these solutions don't resolve your issue:
- Gather diagnostic info:
- Check logs:
- Open a GitHub issue with:
- OpenWrt version
- Exact error messages
- Output of diagnostic commands
- Steps to reproduce
See also: Debugging Guide for advanced troubleshooting techniques.