Pages

set up storm cluster(twitter storm) in local mode

Setup Storm Cluster In Local Mode 

Introduction

Storm is a free and open source distributed realtime computation system. 

1. INSTALLING STORM CLUSTER IN LOCAL MODE:

In this tutorial, I will tell you how to set up storm cluster in local mode(single machine). To set up and execute topology in local mode, we are going to use Maven. There is another way to set up storm in local mode, but I preferd maven.

Local mode simulates a Storm cluster in process and is useful for developing and testing topologies. Running topologies in local mode is similar to running topologies on a cluster.

2. SOFTWARE ENVIRONMENT :

Here the software platform which We used-

  •    Operating System: Ubuntu 12.04
  •    Java: Oracle/SUN , version:1.7.0_25
  •    Maven: Maven2
  •    Storm-0.8.2


3. Prerequisits:

  • Install latest version of JAVA(1.7.0_25) 
  • Set PATH Environment variable for java.
  • Set JAVA_HOME
note: click on this link to know how to install java on Ubuntu/Linux

http://www.techie-knowledge.co.in/2013/08/installing-java-on-linuxubuntu.html

4. How to Install Maven:

TO install Maven on ubuntu check the given link-


5. Check weather maven is installed or not using the command-

                  $ mvn -version

6. Install storm storm-0.8.2:

1. Download storm release (in my case storm-0.8.2) from the following link-

    https://github.com/nathanmarz/storm

2. Extract the downloaded storm release using the following command

$ tar -xvzf ~/Downloads/storm-0.8.2/ -C /usr/local

 

3. Set the path of bin directory of storm release to access the storm command from anywhere,
without typing the full path

Now your storm cluster is ready form local mode. Now can run a sample program to check it Where it is working or not.



7. Now  download the storm example from the given link-

    https://codeload.github.com/storm-book/examples-ch02-getting_started/legacy.zip/master
 
go to download zip button


8. Now go to the directory where you downloded the example and extract it, and change the directory to your project directory.

                $ cd ~/path-to-downloaded-directory/project-name


9. Now compile the project using maven. To compile project just type the following commnd on terminal prompt-

                $ mvn -e compile

After the "build successfull" go to next step.

10. Now type the following command on terminal to run the project to run

 mvn -e exec:java -Dexec.mainClass="TopologyMain" -Dexec.args="src/main/resources/words.txt"

Maven Installation and configuration in Ubuntu

Hello Friends,

In this tutorial I will show you how to install and configure MAVEN in Ubuntu.

Maven is a Java tool for Project Management to make build process easier. I used the maven to compile and package my storm topology to run on storm cluster both in production mode and local mode. 

Storm is a Apache framework for real time computation.


Prerequisites:

JAVA 1.5 or above. follow the following link to see how to install java and set environment for java.

http://ashuuni123.blogspot.com/2013/08/installing-java-on-linuxubuntu.html

Installation of Maven on ubuntu can be pretty straightforward 


$ sudo apt-get install maven2
Files should be installed in /usr/share/maven2

Verification

Type “mvn -version” to verify the installation.

$ mvn -version
Output should something like this-


Apache Maven 2.2.1 (rdebian-8)
Java version: 1.7.0_25
Java home: /usr/local/jdk1.7.0_25/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux" version: "3.2.0-52-generic-pae" arch: "i386" Family: "unix


Where the Maven installed?

The Apt-get installation will install all the required files in the following folder structure-
  1. /usr/bin/mvn
  2. /usr/share/maven2/
  3. /etc/maven2 

Pom File: 

POM is stands for project object model , is an xml file which contains information about the project which you want to build and configuration details used by MAVEN to build the project.
For example it define the directory structure for your source file and your class file. and also define the location where dependencies related to your project is stored.





Maven uses Convention over Configuration which means developers are not required to create build process themselves. Developers do not have to mention each and every configuration details.



Maven provides developers ways to manage following −
  • Builds
  • Documentation
  • Reporting
  • Dependencies
  • SCMs
  • Releases
  • Distribution
  • mailing list

SNAPSHOT is a special version that indicates a current development copy. Unlike regular versions, Maven checks for a new SNAPSHOT version in a remote repository for every build.


Transitive dependency means to avoid needing to discover and specify the libraries that your own dependencies require, and including them automatically.



The various dependency scope used in Maven are:

Compile: It is the default scope, and it indicates what dependency is available in the classpath of the project

Provided: It indicates that the dependency is provided by JDK or web server or container at runtime

Runtime: This tells that the dependency is not needed for compilation but is required during execution

Test: It says dependency is available only for the test compilation and execution phases

System: It indicates you have to provide the system path

Import: This indicates that the identified or specified POM should be replaced with the dependencies in that POM’s section


For more detail follow the given link-

Default structure create by maven for a project



 

Sample of pom.xml file 

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org  /2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>esc</groupId>
    <artifactId>storm-experiment-2</artifactId>
    <version>0.0.1-SNAPSHOT</version>


    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                    <compilerVersion>1.7</compilerVersion>
                </configuration>
            </plugin>

            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.path.to.main.Class</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <repositories>

        <!-- Repository where we can found the storm dependencies -->
        <repository>
            <id>clojars.org</id>
            <url>http://clojars.org/repo</url>
        </repository>

    </repositories>

    <dependencies>

        <!-- Storm Dependency -->
        <dependency>
            <groupId>storm</groupId>
            <artifactId>storm</artifactId>
            <version>0.8.2</version>
        </dependency>

    </dependencies>



</project>

 

* project – root element


* modelVersion –

  The modelVersion element sets what version of the POM model you are using 

* groupId –
  The groupId element is a unique ID for an organization, or a project 

* artifactId –
  The artifactId element contains the name of the project you are building 

* version – The versionId element contains the version number of the project.

  The above groupId, artifactId and version elements would result in a JAR file being built and put into the local Maven repository 

Maven life cycle

Every build follows a specified life cycle. Maven comes with a default life cycle that includes the most common build phases like compiling, testing and packaging.
The following lists gives an overview of the important Maven life cycle phases.
  • validate - checks if the project is correct and all information is available
  • compile - compiles source code in binary artifacts
  • test - executes the tests
  • package - takes the compiled code and package it, for example into a JAR file.
  • integration-test - takes the packaged result and executes additional tests, which require the packaging
  • verify - performs checks if the package is valid
  • install - install the result of the package phase into the local Maven repository
  • deploy - deploys the package to a target, i.e. remote repository

 

Maven Repository

There are 3 types of repository in maven
  1. Local Repository 
  2. Remote Repository
  3. Central  Repository

Local Repository:  When you install and run maven first time, it will create a .m2 directory on your home directory, which contains a another directory name repository like- 

$HOME/.m2/repository
This the default location for the jar which maven check. If the particular jar is not in local repository then it will be downloaded from the remote repository which is set by maven when we installed the maven.
Remote Repository which is developer's own custom repository containing required libraries or other project jars.

Central Repository : Maven central repository is repository provided by Maven community. It contains a large number of commonly used libraries.
When Maven does not find any dependency in local repository, it starts searching in central repository using following URL: http://repo1.maven.org/maven2/

  Using maven behind proxy

To use maven behind the proxy we have to define the proxy setting in setting.xml  file inside the proxy element.
The setting file is found in /etc/maven2/setting.xml
Add the following line under proxy element

<settings>
<proxies>
<proxy>
<active>true</active>
<protocol>http</protocol>
<host>192.168.1.100</host>
<port>3128</port>
<username>your-username</username>
<password>your-password</password>
</proxy>
</proxies>
</settings>

Submitting topology on storm cluster in production mode on linux(ubuntu)

Installing storm client

To submit Topology on production mode we need a storm client.
To install the storm-client on a machine follows the following steps -

1. Download storm release (in my case storm-0.8.2) from the following link-
    https://github.com/nathanmarz/storm

 Note: Skip steps 2,3 and 4, if you have already installed storm client-
           

2. Extract the downloaded storm release using the following command

         $ tar -xvzf ~/Downloads/storm-0.8.2/ -C /usr/local

3. Set the path of bin directory of storm relase to access the storm command from anywhere, without typing the full path.

4. Create  a local , storm configuration, where we will tell about our nimbus host. To do it follow these steps-

       $ mkdir -p ~/.storm
       $ cat > storm.yaml
       $ nimbus.host: "192.168.1.99" // ip of your nimbus host

Creation of jar file containig your code and all the dependencies of your code

1. Type following command for compiling your code using maven. First change your current directory to your project base directory.

        $ mvn -f  pom.xml compile

3. create a jar containing you code 

       $ mvn -e package

 // this command create jar file of your project under target directory of your project

Submitting jar file to your storm cluster

To submit the jar of your code(toplogy) in cluster type the following command -

$ storm jar  full-path-of-you-jar  name-of-your-main- class  arguments


Environment variable in unix(ubuntu)

An Environment variable is a dynamic "object" on a computer that stores a value, which in turn can be referenced by one or more software programs.

 The value of an environmental variable can the location of all executable files in the file system, the default editor that should be used, or the system local settings.

 An environment variable defines some aspect of a user's or a program's environment that can vary.

 Environment variables are dynamic because they can change. The values they store can be changed to match the current computer system's setup and design (environment). They can also differ between computer systems because each computer can have a different setup and design (environment).

List of some well known environment variables on Ubuntu

  • PATH Contains a colon-separated list of directories in which our system looks for executable files and commands. When we execute a command, the shell searches through each of these directories, one by one, until it finds a directory where the executable exists. A command whose path added to the PATH environment variable can be access directly without typing the full path. Here is the sample of PATH environment variable.
$ echo $PATH
                /usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbi

  • HOME Contains the path to the home directory of the current user. 
  • PWD Contains the path to your current working directory. 
               $ echo $PWD
                 /home/techie/Desktop/storm-jar
  • TERM Contains the name of the running terminal, i.e xterm
  • EDITOR Contains the path to the lightweight program used for editing files, i.e. /usr/bin/nano, or an interactive switch (between gedit under X or nano in this example):
 export EDITOR="$(if [[ -n $DISPLAY ]]; then echo 'gedit'; else echo 'nano'; fi)"
  • VISUAL Contains the path to full-fledged editor that is used for more demanding tasks, such as editing mail; e.g., vi, vim, emacs, etc.
  • MAIL Contains the location of incoming email. The traditional setting is /var/spool/mail/$LOGNAME.
  • ftp_proxy Contains FTP  proxy server,
  • http_proxy Contains HTTP proxy server, 
  • HISTFILE This environment variable contains the location and name of the file in which command typed in terminal prompt saved.
  • HISTFILESIZE Maximun no of line contain in history file
  • HOSTNAME Contains system's host name
  • USER Contains login user name
  • OSTYPE : Contains description about OS.
  • LD_LIBRARY_PATH Contains colon separated list of directory where libraries should be searched for.
  • LOGNAME : contains login name

NOTE: Use printenv command to see the current environment variable and those value

echo $ printev
SSH_AGENT_PID=1702
GPG_AGENT_INFO=/tmp/keyring-nnUEcy/gpg:0:1
TERM=xterm
SHELL=/bin/bash
XDG_SESSION_COOKIE=a84bfeb2ec960f26863997bc0000000d-1379048055.358900-1791728848
WINDOWID=54525958

Find Ip Adreess of a WebSite

In this Post I will tell you about How to know the IP address of a website. As we know that Every Machine on a network (Internet) has unique IP address to identify the machine on network. IP address stands for Internet protocol. As every website hosted on a server so we can also know that IP address of a website.

To know the IP address of a website on Ubuntu machine type the following command on terminal prompt:

$ ping www.facebook.com



This sends a signal out to URL, Which then bounce back with website information.

This command is similar on Windows.

To know the IP address of a website on windows just open the command prompt and type the command
 ping www.facebook.com