Pages

Storm framework terminology

Storm is free associate degree open supply, big-data process system that differs from alternative systems therein it's supposed for distributed real-time operation. Storm works on streaming further as static knowledge.Unlike Hadoop, that solely works on execution.

Storm has several use cases: realtime analytics, on-line machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over 1,000,000 tuples processed per second per node. 


storm cluster


Storm Cluster

A storm cluster contains 2 forms of nodes. One is Master node and other is worker node and uses Zookeeper node to co-ordinate to every alternative.

Master node runs a dameon referred to as "Nimbus" and worker node runs a daemon referred to as "supervisor". Master (Nimbus) assign the work for worker node and distribute the code round the cluster and additionally observance for failure.

Worker (supervisor) hear nimbus for work assigned thereto and run worker method to execute the set of a topology.

Each supervisor will run one or additional worker method acoording to configuration outline in storm.yaml. employee method on supervisor hear a specific port. every employee method uses one port to concentrate work assined by the nimbus to supervisor.

Zookeeper node act as a communication bridge between nimbus and supervisor node. All communication between nimbus and supervisor done through zookeeper cluster. nimbus and supervisor area unit homeless and quick fail, all state is kept on  zookeeper on native disk.This means you'll be able to stop Nimbus or the Supervisors and they will begin copy like nothing happened. This style results in Storm clusters being implausibly stable.

Topology: 

 

storm topology

Toplogy is sort of a connected graph of computation. every node in toplogy contains process logic and links between the node outline however proceessed knowledge passed to next node in topolgy.

There area unit 2 element of storm toplogy one is 'spout' and another is 'bolt'.

Spout:

It is the supply of stream that recieves the strems from outside world and pass those stream to bolt for a few process. for instance a spout will receive knowledge kind a computer address that contains files to translate from one indian language to a different indian language.

Bolt:

It is the another element of topology that receives tuples(data/stream) from Spout and do some process on this tuple and should emits new tuples for more proceesing. For eample its invoke a system on a knowledge or tuples comes from spout , that translate this tuple from one indian language to a different indian language.

In a storm topology there is also multiple spouts and bolts. And one bolt more pass tuple to a different bolt for more process.

Tuples:

It is the info structure that is provided by storm framework to pass stream from one element to a different element of storm toplogy. Tuple is associate degree orderted list of object.Storm supports all the primitive varieties, strings, and computer memory unit arrays as tuple field values. To use associate degree object of another sort, you simply ought to implement a serializer for the kind.Tuple will store multiple fields.

A sample of tuple
Object1(filename)  Object2(text)

Above illustration shows a tuple that have 2 fields. initial field stores a computer filename as string object and alternative field store the text of the file as a string object.

since storm could be a distributed framework its ought to acumen arrange and deserialize objects.

By default, Storm will arrange primitive varieties, strings, computer memory unit arrays, ArrayList, HashMap, HashSet, and therefore the Clojure assortment varieties. If you would like to use another sort in your tuples, you'll have to register a custom serializer.

Stream:
A strean is infinite sequence of tuple(data).

Resources

System properties in java

This post contains information concerning, How to access system properties in JAVA ?.

Some times it is necessary to extract the system (OS) dependent information to write down System (OS) independent code.

it show we'd like to access home directory of a user. it is possible that some user uses Linux and a few uses Windows. thus we'd like to access current system setting and that we additionally understand that windows and Linux uses different characters (windows uses '/r/n' and Linux uses '/n') to point out finish of line during a file. thus we'd like to understand these special character to form our program to run in Windows and Linux or anything. for this we tend to used

System.getProperty("path.separator");


instead of using "\r\n"  or "\n" explicity.

exmple
 String lineSeparator = System.getProperty("line.separator"); 
 // all other code goes here

 File out = new File("/tmp/my.txt");
 BufferedWriter bw = new BufferedWriter(new FileWriter(out));

  bw.write(reverseStr);
 bw.write(lineSeparator); // system independent new line

 bw.close();

 

\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\n\r = CR + LF // Used as a new line character in Windows

This table define some most commonly used keys for accessing system property. This tabls is provided by http://docs.oracle.com

Key Meaning
"file.separator" Character that separates components of a file path. This is "/" on UNIX and "\" on Windows.
"java.class.path" Path used to find directories and JAR archives containing class files. Elements of the class path are separated by a platform-specific character specified in the path.separator property.
"java.home" Installation directory for Java Runtime Environment (JRE)
"java.vendor" JRE vendor name
"java.vendor.url" JRE vendor URL
"java.version" JRE version number
"line.separator" Sequence used by operating system to separate lines in text files
"os.arch" Operating system architecture
"os.name" Operating system name
"os.version" Operating system version
"path.separator" Path separator character used in java.class.path
"user.dir" User working directory
"user.home" User home directory
"user.name" User account name