Apache Solr – Architecture

Apache Solr – Architecture ”; Previous Next In this chapter, we will discuss the architecture of Apache Solr. The following illustration shows a block diagram of the architecture of Apache Solr. Solr Architecture ─ Building Blocks Following are the major building blocks (components) of Apache Solr − Request Handler − The requests we send to Apache Solr are processed by these request handlers. The requests might be query requests or index update requests. Based on our requirement, we need to select the request handler. To pass a request to Solr, we will generally map the handler to a certain URI end-point and the specified request will be served by it. Search Component − A search component is a type (feature) of search provided in Apache Solr. It might be spell checking, query, faceting, hit highlighting, etc. These search components are registered as search handlers. Multiple components can be registered to a search handler. Query Parser − The Apache Solr query parser parses the queries that we pass to Solr and verifies the queries for syntactical errors. After parsing the queries, it translates them to a format which Lucene understands. Response Writer − A response writer in Apache Solr is the component which generates the formatted output for the user queries. Solr supports response formats such as XML, JSON, CSV, etc. We have different response writers for each type of response. Analyzer/tokenizer − Lucene recognizes data in the form of tokens. Apache Solr analyzes the content, divides it into tokens, and passes these tokens to Lucene. An analyzer in Apache Solr examines the text of fields and generates a token stream. A tokenizer breaks the token stream prepared by the analyzer into tokens. Update Request Processor − Whenever we send an update request to Apache Solr, the request is run through a set of plugins (signature, logging, indexing), collectively known as update request processor. This processor is responsible for modifications such as dropping a field, adding a field, etc. Print Page Previous Next Advertisements ”;

Apache Solr – Terminology

Apache Solr – Terminology ”; Previous Next In this chapter, we will try to understand the real meaning of some of the terms that are frequently used while working on Solr. General Terminology The following is a list of general terms that are used across all types of Solr setups − Instance − Just like a tomcat instance or a jetty instance, this term refers to the application server, which runs inside a JVM. The home directory of Solr provides reference to each of these Solr instances, in which one or more cores can be configured to run in each instance. Core − While running multiple indexes in your application, you can have multiple cores in each instance, instead of multiple instances each having one core. Home − The term $SOLR_HOME refers to the home directory which has all the information regarding the cores and their indexes, configurations, and dependencies. Shard − In distributed environments, the data is partitioned between multiple Solr instances, where each chunk of data can be called as a Shard. It contains a subset of the whole index. SolrCloud Terminology In an earlier chapter, we discussed how to install Apache Solr in standalone mode. Note that we can also install Solr in distributed mode (cloud environment) where Solr is installed in a master-slave pattern. In distributed mode, the index is created on the master server and it is replicated to one or more slave servers. The key terms associated with Solr Cloud are as follows − Node − In Solr cloud, each single instance of Solr is regarded as a node. Cluster − All the nodes of the environment combined together make a cluster. Collection − A cluster has a logical index that is known as a collection. Shard − A shard is portion of the collection which has one or more replicas of the index. Replica − In Solr Core, a copy of shard that runs in a node is known as a replica. Leader − It is also a replica of shard, which distributes the requests of the Solr Cloud to the remaining replicas. Zookeeper − It is an Apache project that Solr Cloud uses for centralized configuration and coordination, to manage the cluster and to elect a leader. Configuration Files The main configuration files in Apache Solr are as follows − Solr.xml − It is the file in the $SOLR_HOME directory that contains Solr Cloud related information. To load the cores, Solr refers to this file, which helps in identifying them. Solrconfig.xml − This file contains the definitions and core-specific configurations related to request handling and response formatting, along with indexing, configuring, managing memory and making commits. Schema.xml − This file contains the whole schema along with the fields and field types. Core.properties − This file contains the configurations specific to the core. It is referred for core discovery, as it contains the name of the core and path of the data directory. It can be used in any directory, which will then be treated as the core directory. Print Page Previous Next Advertisements ”;