Advanced Excel – Financial Functions ”; Previous Next Excel Financial functions perform many of the common financial calculations, such as the calculation of yield, interest rates, duration, valuation and asset depreciation. S.No. Function and Description 1 ACCRINT Returns the accrued interest for a security that pays periodic interest 2 ACCRINTM Returns the accrued interest for a security that pays interest at maturity 3 AMORDEGRC Returns the depreciation for each accounting period 4 AMORLINC Returns the depreciation for each accounting period (the depreciation coefficient depends on the life of the assets) 5 COUPDAYBS Returns the number of days from the beginning of the coupon period to the settlement date 6 COUPDAYS Returns the number of days in the coupon period that contains the settlement date 7 COUPDAYSNC Returns the number of days from the settlement date to the next coupon date 8 COUPNCD Returns the next coupon date after the settlement date 9 COUPNUM Returns the number of coupons payable between the settlement date and maturity date 10 COUPPCD Returns the previous coupon date before the settlement date 11 CUMIPMT Returns the cumulative interest paid between two periods 12 CUMPRINC Returns the cumulative principal paid on a loan between two periods 13 DB Returns the depreciation of an asset for a specified period, using the fixed-declining-balance method 14 DDB Returns the depreciation of an asset for a specified period, using the double-declining-balance method or some other method that you specify 15 DISC Returns the discount rate for a security 16 DOLLARDE Converts a dollar price, expressed as a fraction, into a dollar price, expressed as a decimal number 17 DOLLARFR Converts a dollar price, expressed as a decimal number, into a dollar price, expressed as a fraction 18 DURATION Returns the annual duration of a security with periodic interest payments 19 EFFECT Returns the effective annual interest rate 20 FV Returns the future value of an investment 21 FVSCHEDULE Returns the future value of an initial principal after applying a series of compound interest rates 22 INTRATE Returns the interest rate for a fully invested security 23 IPMT Returns the interest payment for an investment for a given period 24 IRR Returns the internal rate of return for a series of cash flows 25 ISPMT Calculates the interest paid during a specific period of an investment 26 MDURATION Returns the Macauley modified duration for a security with an assumed par value of $100 27 MIRR Returns the internal rate of return where positive and negative cash flows are financed at different rates 28 NOMINAL Returns the annual nominal interest rate 29 NPER Returns the number of periods for an investment 30 NPV Returns the net present value of an investment based on a series of periodic cash flows and a discount rate 31 ODDFPRICE Returns the price per $100 face value of a security with an odd first period 32 ODDFYIELD Returns the yield of a security with an odd first period 33 ODDLPRICE Returns the price per $100 face value of a security with an odd last period 34 ODDLYIELD Returns the yield of a security with an odd last period 35 PDURATION Returns the number of periods required by an investment to reach a specified value 36 PMT Returns the periodic payment for an annuity 37 PPMT Returns the payment on the principal for an investment for a given period 38 PRICE Returns the price per $100 face value of a security that pays periodic interest 39 PRICEDISC Returns the price per $100 face value of a discounted security 40 PRICEMAT Returns the price per $100 face value of a security that pays interest at maturity 41 PV Returns the present value of an investment 42 RATE Returns the interest rate per period of an annuity 43 RECEIVED Returns the amount received at maturity for a fully invested security 44 RRI Returns an equivalent interest rate for the growth of an investment 45 SLN Returns the straight-line depreciation of an asset for one period 46 SYD Returns the sum-of-years’ digits depreciation of an asset for a specified period 47 TBILLEQ Returns the bond-equivalent yield for a Treasury bill 48 TBILLPRICE Returns the price per $100 face value for a Treasury bill 49 TBILLYIELD Returns the yield for a Treasury bill 50 VDB Returns the depreciation of an asset for a specified or partial period using a declining-balance method 51 XIRR Returns the internal rate of return for a schedule of cash flows that is not necessarily periodic 52 XNPV Returns the net present value for a schedule of cash flows that is not necessarily periodic 53 YIELD Returns the yield on a security that pays periodic interest 54 YIELDDISC Returns the annual yield for a discounted security, for example, a Treasury bill 55 YIELDMAT Returns the annual yield of a security that pays interest at maturity Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Database Functions
Advanced Excel – Database Functions ”; Previous Next The Excel Database functions work with an Excel Database. This typically takes the form of a large table of Data, where each row in the table stores an individual record. Each column in the Worksheet table stores a different field for each record. The Database functions perform basic operations, such as Sum, Average, Count, etc., and additionally use criteria arguments, that allow you to perform the calculation only for a specified subset of the records in your Database. Other records in the Database are ignored. Database Functions The following table lists all the Database functions − S.No. Function and Description 1 DAVERAGE Averages the values in a column of a list or database that match conditions you specify. 2 DCOUNT Counts the cells that contain numbers in a column of a list or database that match conditions you specify. 3 DCOUNTA Counts the nonblank cells in a column of a list or database that match conditions you specify. 4 DGET Returns a single value from a column of a list or database that matches conditions you specify. 5 DMAX Returns the largest number in a column of a list or database that matches conditions you specify. 6 DMIN Returns the smallest number in a column of a list or database that matches conditions you specify. 7 DPRODUCT Multiplies the values in a column of a list or database that match conditions you specify. 8 DSTDEV Estimates the standard deviation of a population based on a sample by using the numbers in a column of a list or database that match conditions you specify. 9 DSTDEVP Calculates the standard deviation of a population based on the entire population, using the numbers in a column of a list or database that match conditions you specify. 10 DSUM Adds the numbers in a column of a list or database that match conditions you specify. 11 DVAR Estimates the variance of a population based on a sample by using the numbers in a column of a list or database that match conditions you specify. 12 DVARP Calculates the variance of a population based on the entire population by using the numbers in a column of a list or database that match conditions you specify. Print Page Previous Next Advertisements ”;
Advanced Excel – Cube Functions ”; Previous Next The Excel Cube functions enable data from OLAP cubes to be brought into Excel to perform calculations. These functions are supported with a connection to Microsoft SQL Server 2005 Analysis Services or later data source. As PowerPivot creates a data source, which is compatible with OLAP cubes, it can also be used with these functions. Cube Functions The following table lists all the Cube functions − S.No. Function and Description 1 CUBEKPIMEMBER Returns a key performance indicator name, property, and measure, and displays the name and property in the cell. 2 CUBEMEMBER Returns a member or tuple in a cube hierarchy. 3 CUBEMEMBERPROPERTY Returns the value of a member property in the cube. 4 CUBERANKEDMEMBER Returns the nth, or ranked, member in a set. 5 CUBESET Defines a calculated set of members or tuples by sending a set expression to the cube on the server. 6 CUBESETCOUNT Returns the number of items in a set. 7 CUBEVALUE Returns an aggregated value from a cube. Print Page Previous Next Advertisements ”;
Information Functions
Advanced Excel – Information Functions ”; Previous Next Information functions provide information about the content, formatting and location of cells in an Excel Worksheet. Information Functions The following table lists all the Information functions − S.No. Function and Description 1 CELL Returns information about the formatting, location, or contents of a cell 2 ERROR.TYPE Returns a number corresponding to an error type 3 INFO Returns information about the current operating environment 4 ISBLANK Returns TRUE if the value is blank 5 ISERR Returns TRUE if the value is any error value except #N/A 6 ISERROR Returns TRUE if the value is any error value 7 ISEVEN Returns TRUE if the number is even 8 ISFORMULA Returns TRUE if there is a reference to a cell that contains a formula 9 ISLOGICAL Returns TRUE if the value is a logical value 10 ISNA Returns TRUE if the value is the #N/A error value 11 ISNONTEXT Returns TRUE if the value is not text 12 ISNUMBER Returns TRUE if the value is a number 13 ISODD Returns TRUE if the number is odd 14 ISREF Returns TRUE if the value is a reference 15 ISTEXT Returns TRUE if the value is text 16 N Returns a value converted to a number 17 NA Returns the error value #N/A 18 SHEET Returns the sheet number of the referenced sheet 19 SHEETS Returns the number of sheets in a reference 20 TYPE Returns a number indicating the data type of a value Print Page Previous Next Advertisements ”;
Apache Flume – Quick Guide
Apache Flume – Quick Guide ”; Previous Next Apache Flume – Introduction What is Flume? Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events (etc…) from various sources to a centralized data store. Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS. Applications of Flume Assume an e-commerce web application wants to analyze the customer behavior from a particular region. To do so, they would need to move the available log data in to Hadoop for analysis. Here, Apache Flume comes to our rescue. Flume is used to move the log data generated by application servers into HDFS at a higher speed. Advantages of Flume Here are the advantages of using Flume − Using Apache Flume we can store the data in to any of the centralized stores (HBase, HDFS). When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as a mediator between data producers and the centralized stores and provides a steady flow of data between them. Flume provides the feature of contextual routing. The transactions in Flume are channel-based where two transactions (one sender and one receiver) are maintained for each message. It guarantees reliable message delivery. Flume is reliable, fault tolerant, scalable, manageable, and customizable. Features of Flume Some of the notable features of Flume are as follows − Flume ingests log data from multiple web servers into a centralized store (HDFS, HBase) efficiently. Using Flume, we can get the data from multiple servers immediately into Hadoop. Along with the log files, Flume is also used to import huge volumes of event data produced by social networking sites like Facebook and Twitter, and e-commerce websites like Amazon and Flipkart. Flume supports a large set of sources and destinations types. Flume supports multi-hop flows, fan-in fan-out flows, contextual routing, etc. Flume can be scaled horizontally. Apache Flume – Data Transfer In Hadoop Big Data, as we know, is a collection of large datasets that cannot be processed using traditional computing techniques. Big Data, when analyzed, gives valuable results. Hadoop is an open-source framework that allows to store and process Big Data in a distributed environment across clusters of computers using simple programming models. Streaming / Log Data Generally, most of the data that is to be analyzed will be produced by various data sources like applications servers, social networking sites, cloud servers, and enterprise servers. This data will be in the form of log files and events. Log file − In general, a log file is a file that lists events/actions that occur in an operating system. For example, web servers list every request made to the server in the log files. On harvesting such log data, we can get information about − the application performance and locate various software and hardware failures. the user behavior and derive better business insights. The traditional method of transferring data into the HDFS system is to use the put command. Let us see how to use the put command. HDFS put Command The main challenge in handling the log data is in moving these logs produced by multiple servers to the Hadoop environment. Hadoop File System Shell provides commands to insert data into Hadoop and read from it. You can insert data into Hadoop using the put command as shown below. $ Hadoop fs –put /path of the required file /path in HDFS where to save the file Problem with put Command We can use the put command of Hadoop to transfer data from these sources to HDFS. But, it suffers from the following drawbacks − Using put command, we can transfer only one file at a time while the data generators generate data at a much higher rate. Since the analysis made on older data is less accurate, we need to have a solution to transfer data in real time. If we use put command, the data is needed to be packaged and should be ready for the upload. Since the webservers generate data continuously, it is a very difficult task. What we need here is a solutions that can overcome the drawbacks of put command and transfer the “streaming data” from data generators to centralized stores (especially HDFS) with less delay. Problem with HDFS In HDFS, the file exists as a directory entry and the length of the file will be considered as zero till it is closed. For example, if a source is writing data into HDFS and the network was interrupted in the middle of the operation (without closing the file), then the data written in the file will be lost. Therefore we need a reliable, configurable, and maintainable system to transfer the log data into HDFS. Note − In POSIX file system, whenever we are accessing a file (say performing write operation), other programs can still read this file (at least the saved portion of the file). This is because the file exists on the disc before it is closed. Available Solutions To send streaming data (log files, events etc..,) from various sources to HDFS, we have the following tools available at our disposal − Facebook’s Scribe Scribe is an immensely popular tool that is used to aggregate and stream log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Apache Kafka Kafka has been developed by Apache Software Foundation. It is an open-source message broker. Using Kafka, we can handle feeds with high-throughput and low-latency. Apache Flume Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log data, events (etc…) from various webserves to a centralized data store. It is a highly reliable, distributed, and configurable tool that is principally designed to transfer streaming data from various sources to
Math and Trigonometric Functions ”; Previous Next The Excel Math & Trig functions perform many of the common mathematical calculations, including basic arithmetic, conditional sums & products, exponents & logarithms, and the trigonometric ratios. Some more math-related functions are also discussed in the Statistical functions and Engineering functions categories. Math and Trigonometric Functions The following table lists all the Math & Trigonometric functions − S.No. Function and Description 1 ABS Returns the absolute value of a number 2 ACOS Returns the arccosine of a number 3 ACOSH Returns the inverse hyperbolic cosine of a number 4 ACOT Returns the arccotangent of a number 5 ACOTH Returns the hyperbolic arccotangent of a number 6 AGGREGATE Returns an aggregate in a list or database 7 ARABIC Converts a Roman number to Arabic, as a number 8 ASIN Returns the arcsine of a number 9 ASINH Returns the inverse hyperbolic sine of a number 10 ATAN Returns the arctangent of a number 11 ATAN2 Returns the arctangent from x and y coordinates 12 ATANH Returns the inverse hyperbolic tangent of a number 13 BASE Converts a number into a text representation with the given radix (base) 14 CEILING.MATH Rounds a number up, to the nearest integer or to the nearest multiple of significance 15 COMBIN Returns the number of combinations for a given number of objects 16 COMBINA Returns the number of combinations with repetitions for a given number of items 17 COS Returns the cosine of a number 18 COSH Returns the hyperbolic cosine of a number 19 COT Returns the cotangent of an angle 20 COTH Returns the hyperbolic cotangent of a number 21 CSC Returns the cosecant of an angle 22 CSCH Returns the hyperbolic cosecant of an angle 23 DECIMAL Converts a text representation of a number in a given base into a decimal number 24 DEGREES Converts radians to degrees 25 EVEN Rounds a number up to the nearest even integer 26 EXP Returns e raised to the power of a given number 27 FACT Returns the factorial of a number 28 FACTDOUBLE Returns the double factorial of a number 29 FLOOR.MATH Rounds a number down, to the nearest integer or to the nearest multiple of significance 30 GCD Returns the greatest common divisor 31 INT Rounds a number down to the nearest integer 32 LCM Returns the least common multiple 33 LN Returns the natural logarithm of a number 34 LOG Returns the logarithm of a number to a specified base 35 LOG10 Returns the base-10 logarithm of a number 36 MDETERM Returns the matrix determinant of an array 37 MINVERSE Returns the matrix inverse of an array 38 MMULT Returns the matrix product of two arrays 39 MOD Returns the remainder from division 40 MROUND Returns a number rounded to the desired multiple 41 MULTINOMIAL Returns the multinomial of a set of numbers 42 MUNIT Returns the unit matrix or the specified dimension 43 ODD Rounds a number up to the nearest odd integer 44 PI Returns the value of pi 45 POWER Returns the result of a number raised to a power 46 PRODUCT Multiplies its arguments 47 QUOTIENT Returns the integer portion of a division 48 RADIANS Converts degrees to radians 49 RAND Returns a random number between 0 and 1 50 RANDBETWEEN Returns a random number between the numbers that you specify 51 ROMAN Converts an Arabic numeral to Roman, as text 52 ROUND Rounds a number to a specified number of digits 53 ROUNDDOWN Rounds a number down, toward 0 54 ROUNDUP Rounds a number up, away from 0 55 SEC Returns the secant of an angle 56 SECH Returns the hyperbolic secant of an angle 57 SERIESSUM Returns the sum of a power series based on the formula 58 SIGN Returns the sign of a number 59 SIN Returns the sine of the given angle 60 SINH Returns the hyperbolic sine of a number 61 SQRT Returns a positive square root 62 SQRTPI Returns the square root of pi 63 SUBTOTAL Returns a subtotal in a list or database 64 SUM Adds its arguments 65 SUMIF Adds the cells specified by a given criteria 66 SUMIFS Adds the cells specified by a multiple criteria 67 SUMPRODUCT Returns the sum of the products of corresponding array components 68 SUMSQ Returns the sum of the squares of the arguments 69 SUMX2MY2 Returns the sum of the difference of squares of corresponding values in two arrays 70 SUMX2PY2 Returns the sum of the sum of squares of corresponding values in two arrays 71 SUMXMY2 Returns the sum of squares of differences of corresponding values in two arrays 72 TAN Returns the tangent of a number 73 TANH Returns the hyperbolic tangent of a number 74 TRUNC Truncates a number (you specify the precision of the truncation) Print Page Previous Next Advertisements ”;
Advanced Excel – Logical Functions ”; Previous Next Logical functions include the boolean operators and conditional tests, which will be an essential part of many working spreadsheets. Logical Functions The following table lists all the Logical functions − S.No. Function and Description 1 AND Returns TRUE if all its arguments are TRUE. 2 FALSE Returns the logical value FALSE. 3 IF Specifies a logical test to perform. 4 IFERROR Returns a different result if the first argument evaluates to an error. 5 IFNA Returns the value you specify if the expression resolves to #N/A, otherwise returns the result of the expression. 6 IFS Checks whether one or more conditions are met and returns a value that corresponds to the first TRUE condition. 7 NOT Reverses the logic of its argument. 8 OR Returns TRUE if any argument is TRUE. 9 SWITCH Evaluates an expression against a list of values and returns the result corresponding to the first matching value. If there is no match, an optional default value may be returned. 10 TRUE Returns the logical value TRUE. 11 XOR Returns a logical exclusive OR of all arguments. Print Page Previous Next Advertisements ”;
Statistical Functions
Advanced Excel – Statistical Functions ”; Previous Next Statistical functions perform calculations ranging from basic mean, median & mode to the more complex statistical distribution and probability tests. Statistical Functions The following table lists all the Statistical functions − S.No. Function and Description 1 AVEDEV Returns the average of the absolute deviations of data points from their mean 2 AVERAGE Returns the average of its arguments 3 AVERAGEA Returns the average of its arguments and includes evaluation of text and logical values 4 AVERAGEIF Returns the average for the cells specified by a given criterion 5 AVERAGEIFS Returns the average for the cells specified by multiple criteria 6 BETA.DIST Returns the beta cumulative distribution function 7 BETA.INV Returns the inverse of the cumulative distribution function for a specified beta distribution 8 BINOM.DIST Returns the individual term binomial distribution probability 9 BINOM.DIST.RANGE Returns the probability of a trial result using a binomial distribution 10 BINOM.INV Returns the smallest value for which the cumulative binomial distribution is less than or equal to a criterion value 11 CHISQ.DIST Returns the cumulative beta probability density function 12 CHISQ.DIST.RT Returns the one-tailed probability of the chi-squared distribution 13 CHISQ.INV Returns the cumulative beta probability density function 14 CHISQ.INV.RT Returns the inverse of the one-tailed probability of the chi-squared distribution 15 CHISQ.TEST Returns the test for independence 16 CONFIDENCE.NORM Returns the confidence interval for a population mean 17 CONFIDENCE.T Returns the confidence interval for a population mean, using a Student”s t distribution 18 CORREL Returns the correlation coefficient between two data sets 19 COUNT Counts how many numbers are in the list of arguments 20 COUNTA Counts how many values are in the list of arguments 21 COUNTBLANK Counts the number of blank cells in the argument range 22 COUNTIF Counts the number of cells that meet the criteria you specify in the argument 23 COUNTIFS Counts the number of cells that meet multiple criteria 24 COVARIANCE.P Returns covariance, the average of the products of paired deviations 25 COVARIANCE.S Returns the sample covariance, the average of the products deviations for each data point pair in two data sets 26 DEVSQ Returns the sum of squares of deviations 27 EXPON.DIST Returns the exponential distribution 28 F.DIST Returns the F probability distribution 29 F.DIST.RT Returns the F probability distribution 30 F.INV Returns the inverse of the F probability distribution 31 F.INV.RT Returns the inverse of the F probability distribution 32 F.TEST Returns the result of an F-test 33 FISHER Returns the Fisher transformation 34 FISHERINV Returns the inverse of the Fisher transformation 35 FORECAST Returns a value along a linear trend 36 FORECAST.ETS Calculates a future value based on existing values using the Exponential Triple Smoothing (ETS) algorithm 37 FORECAST.ETS.CONFINT Returns a confidence interval for the forecast value at the specified target date 38 FORECAST.ETS.SEASONALITY Returns the length of the repetitive pattern detected for the specified time series 39 FORECAST.ETS.STAT Returns a statistical value as a result of time series forecasting 40 FORECAST.LINEAR Calculates a future value by using existing values, using linear regression. 41 FREQUENCY Returns a frequency distribution as a vertical array 42 GAMMA Returns the Gamma function value 43 GAMMA.DIST Returns the gamma distribution 44 GAMMA.INV Returns the inverse of the gamma cumulative distribution 45 GAMMALN Returns the natural logarithm of the gamma function, G(x) 46 GAMMALN.PRECISE Returns the natural logarithm of the gamma function, G(x) 47 GAUSS Returns 0.5 less than the standard normal cumulative distribution 48 GEOMEAN Returns the geometric mean 49 GROWTH Returns values along an exponential trend 50 HARMEAN Returns the harmonic mean 51 HYPGEOM.DIST Returns the hypergeometric distribution 52 INTERCEPT Returns the intercept of the linear regression line 53 KURT Returns the kurtosis of a data set 54 LARGE Returns the kth largest value in a data set 55 LINEST Returns the parameters of a linear trend 56 LOGEST Returns the parameters of an exponential trend 57 LOGNORM.DIST Returns the cumulative lognormal distribution 58 LOGNORM.INV Returns the inverse of the lognormal cumulative distribution 59 MAX Returns the maximum value in a list of arguments, ignoring logical values and text 60 MAXA Returns the maximum value in a list of arguments, including logical values and text 61 MAXIFS Returns the maximum value among cells specified by a given set of conditions or criteria. 62 MEDIAN Returns the median of the given numbers 63 MIN Returns the minimum value in a list of arguments, ignoring logical values and text 64 MINA Returns the minimum value in a list of arguments, including logical values and text 65 MINIFS Returns the minimum value among cells specified by a given set of conditions or criteria. 66 MODE.MULT Returns a vertical array of the most frequently occurring, or repetitive values in an array or range of data 67 MODE.SNGL Returns the most common value in a data set 68 NEGBINOM.DIST Returns the negative binomial distribution 69 NORM.DIST Returns the normal cumulative distribution 70 NORM.INV Returns the inverse of the normal cumulative distribution 71 NORM.S.DIST Returns the standard normal cumulative distribution 72 NORM.S.INV Returns the inverse of the standard normal cumulative distribution 73 PEARSON Returns the Pearson product moment correlation coefficient 74 PERCENTILE.EXC Returns the k-th percentile of values in a range, where k is in the range 0..1, exclusive 75 PERCENTILE.INC Returns the k-th percentile of values in a range 76 PERCENTRANK.EXC Returns the rank of a value in a data set as a percentage (0..1, exclusive) of the data set 77 PERCENTRANK.INC Returns the percentage rank of a value in a data set 78 PERMUT Returns the number of permutations for a given number of objects 79 PERMUTATIONA Returns the number of permutations for a given number of objects (with repetitions) that can be selected from the total objects 80 PHI Returns the value of the density function for a standard normal distribution 81 POISSON.DIST Returns the Poisson distribution 82 PROB Returns the probability that values in a range are between two limits 83 QUARTILE.EXC Returns the quartile of the data
Apache Flume – Architecture
Apache Flume – Architecture ”; Previous Next The following illustration depicts the basic architecture of Flume. As shown in the illustration, data generators (such as Facebook, Twitter) generate data which gets collected by individual Flume agents running on them. Thereafter, a data collector (which is also an agent) collects the data from the agents which is aggregated and pushed into a centralized store such as HDFS or HBase. Flume Event An event is the basic unit of the data transported inside Flume. It contains a payload of byte array that is to be transported from the source to the destination accompanied by optional headers. A typical Flume event would have the following structure − Flume Agent An agent is an independent daemon process (JVM) in Flume. It receives the data (events) from clients or other agents and forwards it to its next destination (sink or agent). Flume may have more than one agent. Following diagram represents a Flume Agent As shown in the diagram a Flume Agent contains three main components namely, source, channel, and sink. Source A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events. Apache Flume supports several types of sources and each source receives events from a specified data generator. Example − Avro source, Thrift source, twitter 1% source etc. Channel A channel is a transient store which receives the events from the source and buffers them till they are consumed by sinks. It acts as a bridge between the sources and the sinks. These channels are fully transactional and they can work with any number of sources and sinks. Example − JDBC channel, File system channel, Memory channel, etc. Sink A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores. Example − HDFS sink Note − A flume agent can have multiple sources, sinks and channels. We have listed all the supported sources, sinks, channels in the Flume configuration chapter of this tutorial. Additional Components of Flume Agent What we have discussed above are the primitive components of the agent. In addition to this, we have a few more components that play a vital role in transferring the events from the data generator to the centralized stores. Interceptors Interceptors are used to alter/inspect flume events which are transferred between source and channel. Channel Selectors These are used to determine which channel is to be opted to transfer the data in case of multiple channels. There are two types of channel selectors − Default channel selectors − These are also known as replicating channel selectors they replicates all the events in each channel. Multiplexing channel selectors − These decides the channel to send an event based on the address in the header of that event. Sink Processors These are used to invoke a particular sink from the selected group of sinks. These are used to create failover paths for your sinks or load balance events across multiple sinks from a channel. Print Page Previous Next Advertisements ”;
Apache Flume – Environment
Apache Flume – Environment ”; Previous Next We already discussed the architecture of Flume in the previous chapter. In this chapter, let us see how to download and setup Apache Flume. Before proceeding further, you need to have a Java environment in your system. So first of all, make sure you have Java installed in your system. For some examples in this tutorial, we have used Hadoop HDFS (as sink). Therefore, we would recommend that you go install Hadoop along with Java. To collect more information, follow the link − https://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm Installing Flume First of all, download the latest version of Apache Flume software from the website https://flume.apache.org/. Step 1 Open the website. Click on the download link on the left-hand side of the home page. It will take you to the download page of Apache Flume. Step 2 In the Download page, you can see the links for binary and source files of Apache Flume. Click on the link apache-flume-1.6.0-bin.tar.gz You will be redirected to a list of mirrors where you can start your download by clicking any of these mirrors. In the same way, you can download the source code of Apache Flume by clicking on apache-flume-1.6.0-src.tar.gz. Step 3 Create a directory with the name Flume in the same directory where the installation directories of Hadoop, HBase, and other software were installed (if you have already installed any) as shown below. $ mkdir Flume Step 4 Extract the downloaded tar files as shown below. $ cd Downloads/ $ tar zxvf apache-flume-1.6.0-bin.tar.gz $ tar zxvf apache-flume-1.6.0-src.tar.gz Step 5 Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory created earlier as shown below. (Assume we have created the Flume directory in the local user named Hadoop.) $ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/ Configuring Flume To configure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc. Setting the Path / Classpath In the .bashrc file, set the home folder, the path, and the classpath for Flume as shown below. conf Folder If you open the conf folder of Apache Flume, you will have the following four files − flume-conf.properties.template, flume-env.sh.template, flume-env.ps1.template, and log4j.properties. Now rename flume-conf.properties.template file as flume-conf.properties and flume-env.sh.template as flume-env.sh flume-env.sh Open flume-env.sh file and set the JAVA_Home to the folder where Java was installed in your system. Verifying the Installation Verify the installation of Apache Flume by browsing through the bin folder and typing the following command. $ ./flume-ng If you have successfully installed Flume, you will get a help prompt of Flume as shown below. Print Page Previous Next Advertisements ”;