Troubleshooting
Resilience and Data Integrity
Erigon is highly resilient and uses a fully-transactional database. This design makes it safe against hard termination (kill -9) and power outages. The database ensures users never see "partial writes," meaning all data changes are atomic (all-or-nothing), and all RPC methods operate within Read-Only Transactions, guaranteeing a consistent data view.
Protect Against Hardware Failure: True data corruption is typically only caused by hardware failures (like disk or RAM failure). We strongly recommend using ECC memory, disk RAID, and performing regular backups to mitigate these risks.
When an issue arises, follow these steps to methodically diagnose and resolve the problem.
- Check Hardware Requirements: The most common cause of issues is insufficient disk or RAM. Ensure your system meets the recommended Hardware Requirements. Note that Erigon is very adaptive—adding more RAM to the server will make Erigon faster without requiring any setting changes.
- Inspect Erigon Logs: The logs are your best friend. Use
tail -f erigon.logorjournalctlto see real-time output and identify error messages or warnings. See Logs for log configuration options. - Verify Sync Status: Use
curl localhost:8545 \-X POST \-H "Content-Type: application/json" \--data '{"jsonrpc":"2.0","method":"eth\_syncing","params":\[\],"id":1}'to check if the node is actively syncing. - Monitor System Resources: Use
htop,top, oriostatto monitor CPU, RAM, and disk I/O. This can help you identify a performance bottleneck. For a full monitoring dashboard, see Creating a Dashboard. - Look for OOM-kill events: After an unexpected crash, always check your system logs for an "Out of Memory" killer event. This confirms if a memory issue caused the crash.
- Perform a Simple Restart: If the node is stalled, simply restart the service using
systemctl restart erigon. Erigon's transactional database is designed to handle interruption gracefully. - Disable RPC/CL during Initial Sync: If you are stalling during snapshot sync, try restarting without the RPC daemon or consensus client to reduce concurrent disk access, which is a bottleneck during the slowest stage (Blocks Execution).
- Check for Disk Space: Regularly check your disk usage. A full disk will cause performance degradation and can lead to a node crash. See Optimizing Storage for tips on reducing disk usage.
- Verify Network Time: Ensure your system's clock is synchronized. Incorrect time can cause issues with block propagation.
- Check P2P Peer Connections: Use
net\_peerCountor similar RPC methods to check if you have a healthy number of peers. A low count may indicate a network problem. See Default Ports to verify firewall rules allow P2P traffic. - Review Firewall Rules: Confirm that your firewall is not blocking inbound or outbound traffic on the required P2P and RPC ports.
- Double-Check Configuration Flags: Review all your command-line flags for typos or incorrect values. A single misplaced character can cause a cryptic error. See the CLI Reference for the full flag list.
- Check for Snapshot File Issues: For version upgrades, a known issue with snapshot filenames can cause problems. Use snapshot upgrade and repair options.
- Correct File Ownership: If using a dedicated user or Docker, confirm that the user has full read/write access to the datadir. See the Docker Compose guide for container-specific permission tips.
- Adjust RPC Timeouts: If specific RPC requests are timing out, try increasing the timeout values to allow more time for heavy requests to complete. See the RPC & API flags in the CLI Reference.
- Check for
DB.read.concurrencyissues: If you have high RPC traffic and low TPS, try reducing the--DB.read.concurrencyflag. See Configuring Erigon for all database-related flags. - Report a Bug: If all else fails, open a detailed bug report on GitHub with logs, version info, and a clear description of the problem.
- Engage with the Community: The Erigon Discord server is an invaluable resource for seeking help from core developers and experienced users.
Collecting Diagnostics for Bug Reports
Before opening a GitHub issue, gather the following information to help the team reproduce and fix your problem faster.
Dump goroutine stacks (sends SIGUSR1 to the running process — safe, non-destructive):
kill -SIGUSR1 $(pidof erigon)
# Stack traces are printed to the erigon log / stdout
Capture a CPU or heap profile via pprof (requires --pprof flag at startup — default address localhost:6060; override with --pprof.addr and --pprof.port):
# CPU profile — 30-second sample
curl -o cpu.pprof http://localhost:6060/debug/pprof/profile?seconds=30
# Heap profile
curl -o heap.pprof http://localhost:6060/debug/pprof/heap
# Inspect locally
go tool pprof -http=:8080 cpu.pprof
Attach the .pprof files and the goroutine dump to your GitHub issue.
Hetzner Cloud / Dedicated Server Firewall Note
Hetzner applies a stateless firewall at the network edge. Ensure the following ports are open for both TCP and UDP, inbound and outbound:
| Purpose | Port | Protocol |
|---|---|---|
| P2P (Ethereum) | 30303 | TCP+UDP |
| P2P (Caplin) | 9000 | TCP+UDP |
Without these, the node may appear to have peers (via the cloud dashboard) but will suffer poor block propagation. Configure the firewall in the Hetzner Cloud Console under Firewalls or via hcloud firewall.