How to Make Apache Ignite Production Ready

Actionable guidelines to make Apache Ignite production-ready.

4 min readFeb 14, 2020

This article was originally published at https://www.bugdbug.com/post/how-to-make-apache-ignite-production-ready

In this article, we will discuss the guidelines that should be followed while deploying Apache Ignite in the production environment. It will help you to get the most out of it and will make sure it performs better without any problem. I assume you are familiar with Apache Ignite and its capabilities. If not, it is best to stop here and first go through this official documentation of Apache Ignite and get yourself familiar with it.

Apache Ignite is primarily a in-memory distributed caching and computing framework. It also provides different capabilities like service-grid, messaging etc.

1. Storage

As we know, Ignite uses off-heap to store the cache data. By default, Ignite acquires 20% of the available memory as Off-Heap for cache data storage and all the caches use this storage. By default, DataEvictionModeof DataStorageRegion is disabled. This will make a system to throw an exception when DataRegion is filled.

Following guidelines should be followed while configuring data storage

FULLY_REPLICATED caches perform better for read-operations. Thus, read-intensive caches with moderate size data should use FULLY_REPLICATED mode.
PARTITIONED caches give better performance for write operations. Thus, caches with write-intensive operations should use PARTITIONED caches.
Avoid using Ignite’s Native Persistence for FULLY_REPLICATED caches.
To avoid data loss for PARTITIONED caches, one should use Ignite’s native persistence or third party persistence or use proper replication factor.
Define separate DataRegions with appropriate DataEvictionMode (DISABLED, LRU) and sizes for caches with different needs. Avoid using the same DataRegion for all the caches.
For partitioned cache, one can also use swap space instead of native persistence or third party persistence.

2. Cluster Discovery

For any distributed system, cluster formation is an essential process. By default, Ignite uses TCP discovery with a multicast IP finder. In most of the production deployments, multicasting is restricted. For elastic clusters, static IpFinder based discovery is not recommended.

Thus, one could use TcpDiscoveryJdbcIpFinder or others apart from static and multicast.

For clusters with more than 100 nodes, it is advised to use Zookeeper based discovery mechanism.

3. Security

Cluster security is of utmost importance for data storage systems. Apache Ignite does not provide authentication and authorization mechanism out of the box. Authentication during cluster joining and data access authentication and authorization can be implemented by writing a custom security plugin by following this article.

4. Network partitioning

Apache Ignite, like any other distributed system, follows the CAP theorem. So, in the event of Network partitioning, it can either have Availability or Consistency. Since Apache Ignite is a system with the ACID guarantees, it needs to make sure that data consistency is maintained in the event of partitioning.

Network partitioning could occur because of a network glitch, node failure or long GC pauses. Because of network partitioning, the cluster could be divided into one or more segments with one or more nodes. These segments could act as independent clusters and may cause data inconsistency if kept working and writing data.

Thus, handling network partitioning becomes an issue of the highest priority. Segmentation can be handled by writing a custom network segmentation plugin by following this article.

5. Cache node filter

Every cache created using Apache Ignite other than Local cache is deployed on all the nodes whether it being partitioned or fully replicated.

When you have a large cluster, you do not want all the caches to be deployed on all the cluster nodes. This should specifically be avoided for FULLY_REPLICATED caches.

We can achieve this by setting node predicate to the caches as follows

Add node attribute through IgniteConfiguration

Use this property in a cache node filter

6. Miscellaneous

Timeouts like failureDetectionTimeout, networkTimeout andsegmentationCheckFrequency should be configured properly through IgniteConfiguratio.
The default failureHandler stops the Ignite node upon failure. For embedded Ignite, this property should be changed.
The default segmentation policy is SegmentationPolicy.STOP. This should be changed to an appropriate one.
IPv4: According to several discussions on Ignite user forum, different IP configuration causes network segmentation. Thus, it is advised to IPv4 over IPv6 which can be configured through java option as follows http://apache-ignite-users.70518.x6.nabble.com/Ignite-Node-failure-Node-out-of-topology-SEGMENTED-td21360.html -Djava.net.preferIPv4Stack=true
Ports: Apache Ignite uses multiple network ports for different things like node discovery, node communication, REST client, SQL client. Out of these, node discovery (47500) and node communication (47100) are mandatory for it to function properly.
Predicate classes: Classes used in all the predicates should be present in the classpath of all the nodes. These classes are not made available through Peerclassloading. One has to add these classes to the classpath.
Client Mode: If you do not want any cache partitions on some cluster nodes, add those nodes are client nodes.
EventListeners: If you have added event listeners for different events like cache events, node events, etc, make sure that user logic is not getting executed in the event thread.

You can refer to this link for more guidelines on production readiness and performance.

If you have more tips and guidelines feel free to share it in comments.

Thanks for reading. If you find this article helpful, share it with friends!

Resources