Installing And Using GIZA++ in Ubuntu for Word Alignment

What is GIZA++ ?

 GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written by Franz Josef Och.

 What is parallel corpus ?

A parallel corpus is a collection of texts, each of which is translated into one or more other languages than the original.

The simplest case is where two languages only are involved: one of the corpora is an exact translation of the other. Some parallel corpora, however, exist in several languages. 

Installing GIZA++

Step 1- Download Giza++ using following command:

    $ wget

Step 2-  Make Folder for your GIZA++ installation

    $ mkdir giza-practice

Step 3-  $ mv giza-practice/

Step 4-  $ cd giza-practice/

Step 5- $ unzip

Step 6- cd giza-pp-master/

Step 7- make clean

Step 8- make

Creating Parrel Corpus to Use in GIZA++


As we know that GIZA++ is tool for word alignment, it uses parallel corpus for creating dictionary.

In this example we use two language English as Source Language and Hindi as Target Language

Step 1. So First we create a file called hindi.txt and copy the below hindi text in this file.

मैंने उसे किताब दी .
मैंने किताब को पढ़ा .
वह किताब को प्यार करता था .
उसने किताब दी .

Step 2. Now we create a file called english.txt and copy the below english text in this file.

I gave him the book .
I read the book .
He loved the book .
He gave the book .

Now our parallel corpus is created.

 Running GIZA++

Step 1. Copy hindi.txt and english.txt  files to giza-pp-master/GIZA++-v2/

Step 2. cd giza-pp-master/GIZA++-v2/

Step 3. use following command to convert your corpus into GIZA++ format:

    ./plain2snt.out [source_language_corpus] [target_language_corpus]

    $ ./plain2snt.out english.txt hindi.txt

Step 4.  Type following commands for Making class and cooccurrence:

  $ ./../mkcls-v2/mkcls -p[source_language_corpus]   -V[source_language_corpus].vcb.classes

    $ ./../mkcls-v2/mkcls -p[target_language_corpus] -V[target_language_corpus].vcb.classes

Example $./../mkcls-v2/mkcls -penglish.txt -Venglish.txt.vcb.classes
    $./../mkcls-v2/mkcls -phindi.txt -Vhindi.txt.vcb.classes

Step 5. create output directory using command $ mkdir myout

Step 6. Now use GIZA++ to build your dictionary

./GIZA++ -S [target_language_corpus].vcb -T [source_language_corpus].vcb -C [target_language_corpus]_[source_language_corpus].snt -o [prefix] -outputpath [output_folder]

Ex. : $./GIZA++ -S hindi.vcb -T english.vcb -C hindi_english.snt -outputpath myout -o test

Note if you get an error please update the Makefile inside GIZA++-v2



It will generate the output files in myout/ directory
and out of the variuos files file with name [prefix] (file in our case) will be the final file.

It contains the alignment of source and target words according to their probability value:

book NULL 1
. को 0.333333
gave दी 1
He था 0.333333
him उसे 1
loved प्यार 0.5
read पढ़ा 1
the . 1
He उसने 0.333333
. किताब 0.666667
loved करता 0.5
I मैंने 1
He वह 0.333333


hashCode() and equals() Methode in java

As a Java programmer we know that java.lang.Object is the base class of every class in Java language.

Object class provide some method that provide some default implementation.

Since object class is the base class then  method defined by Object class are also available to every class defined in Java, but some time the default implementation of these method is not appropriate for new User Defined classes.

Here we discuses two most important method of object class.

The method Define in object class

      public boolean equals(Object obj)

       .Indicates whether some other object is "equal to" this one.

      The default implementation of equal method compares two objects for equality and returns true if they are equal.

This method only check weather the references of object point to same object or not. means it checks for references not value.

     public int hashCode()

      .Returns a hash code value for the object.

      The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.

Contract between equal() and hashCode()

 1. If two objects are equal, their hash code must also be equal.
 2. If you override the equals() method, you must also override the hashCode() method as well.

Some time we do not want to use default implementation of equals() method in our own define class so we must override this method in our class.

Ex. Suppose we have a class Student and we want to compare weather two student are equal or not base on the instance variable studentId.

then we have to override the equal() method to meet our requirement

The equals method implements an equivalence relation. It is:

Reflexive: For any non-null reference value x , x.equals(x) must return true .

Symmetric: For any non-null reference values x and y , x.equals(y) must re-
turn true if and only if y.equals(x) returns true .

Transitive: For any non-null reference values x , y , z , if x.equals(y) returns
true and y.equals(z) returns true , then x.equals(z) must return true .

Consistent: For any non-null reference values x and y , multiple invocations
of x.equals(y) consistently return true or consistently return false , pro-
vided no information used in equals comparisons on the objects is modified.

• For any non-null reference value x , x.equals(null) must return false .

Here we provide an example how to override equal() and hashCode()


public class Movie {

 String movieName;
 int price;

 public Movie(String movieName, int price) {

  this.movieName = movieName;
  this.price = price;

 public String toString() {
  return "Movie name is " + movieName + " And price is "
    + price;


  * here we want if the movieName of the two Movie oject is same then both
  * Movie object is equal
 public boolean equals(Object o) {

  if (o == this)
   return true;
  if (o == null)
   return false;
  if (!(this.getClass().equals(o.getClass())))
   return false;
  Movie movie = (Movie) o;
  return (this.movieName.equals(movie.movieName)) ? true : false;


 public int hashCode() {

  return 31 * movieName.hashCode();


Test the above class


public class Test {

 public static void main(String[] args) {
  Movie movie1 = new Movie("The Ghazi Attack", 200);
  Movie movie2 = new Movie("The Ghazi Attack", 300);
   System.out.println("object are equal");
  System.out.println("object not equal");
   System.out.println("object are equal");
  System.out.println("object not equal");


Some Basic point about Map, Set and List from JAVA Collection

A Set is a Collection that cannot contain duplicate elements

three general-purpose Set implementations:

1. HashSet :

    Uses HashTable to store its element.
    Uses Hash Function for Storing and retrieving its element.
    Order is not maintain in HashSet.

2. TreeSet :

   Uses Red-Black tree to store its element.
   Order of elements maintained according to their values.

3. LinkedHashSet (LinkeList + HashSet)

     Implemented as a hash table with a linked list running through it.
     orders its elements based on the order in which they were inserted into the set (insertion-order)

A List is an ordered Collection (sometimes called a sequence). Lists may contain duplicate elements

The Java platform contains two general-purpose List implementations

1. ArrayList :
     Use variable-size array to store element
     element can access randomly using index.
     maintain the elements insertion order

2. LinkedList :

   Doubly-linked list implementation of the List
   Sequential access of elements
   maintain the elements insertion order

Note : LinkedList element deletion is faster compared to ArrayList.

A Map is an object that maps keys to values.
A map cannot contain duplicate keys: Each key can map to at most one value

Java platform contains three general-purpose Map implementations:

1.HashMap :

   Hash table based implementation of the Map interface
   makes no guarantees as to the order of the map; in particular, it does not           guarantee that the order will remain constant over time.

2.TreeMap :
   A Red-Black tree based NavigableMap implementation
   The map is sorted according to the natural ordering of its keys

3.LinkedHashMap :
   Hash table and linked list implementation of the Map interface
   maintain the insertion order

Threads Versus Processes

Threads are a mechanism that permits an application to perform
multiple tasks concurrently. A single process can contain multiple threads.

All of these threads are independently executing the same
program, and they all share the same global memory, including the initialized data, uninitialized data, and heap segments.

some of the factors that might influence our choice of whether to implement an application as a group of threads or as a group of processes.

We begin by considering the advantages of a multithreaded approach:

Sharing data between threads is easy. By contrast, sharing data between processes requires more work (e.g., creating a shared memory segment or using a pipe).

Thread creation is faster than process creation; context-switch time may be
lower for threads than for processes.

Using threads can have some disadvantages compared to using processes:

When programming with threads, we need to ensure that the functions we call
are thread-safe or are called in a thread-safe manner.  Multiprocess applications don’t need to be
concerned with this.

A bug in one thread (e.g., modifying memory via an incorrect pointer) can dam-
age all of the threads in the process, since they share the same address space and other attributes. By contrast, processes are more isolated from one another.

Each thread is competing for use of the finite virtual address space of the host
process. In particular, each thread’s stack and thread-specific data (or thread-
local storage) consumes a part of the process virtual address space, which is
consequently unavailable for other threads.

Basics of Relational Data Model

Edgar Codd proposed Relational Data Model in 1970.

It is a representational or implementation data model.

Using this representational (or implementation) model we represent a database as collection of relations.

The notion of relation here is different from the notion of relationship used in ER modeling.

Relation is the main construct for representing data in relational model.
Every relation consists of a relation schema and Relation instance.

Relation Schema is denoted by  R (A1, A2, A3,……., An),

Customer (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email,Sex)

R--> Relation Name
Ai--> Attributes Name

The number of columns in a relation is known as its degree or arity’.

Relation instance or Relation State (r) of R (thought of as a table)
Each row in the table represents a collection of related data.
Each row contains facts about some entity of same entity-set.

        R = (A1, A2, A3,……., An)
        r(R) is a set of n tuples in R
        r = {t1, t2, t3,…….,tn}

r is an instance of R each t is a tuple and is a ordered list of values.
t = (v1  , v2 ,…, vn ) where vi  is an element of domain of Ai   

Characteristics of a  Relation:

Ordering of tuples  is not significant.

Ordering of values in a tuple is  important.

Values in a tuple under each column must be atomic (simple & single).

Using SSH2 in java using jSch

The program SSH (Secure Shell) provides an encrypted channel for logging into another computer over a network, executing commands on a remote computer, and moving files from one computer to another. SSH provides strong host-to-host and user authentication as well as secure encrypted communications over the Internet.

SSH2 is a more secure, efficient, and portable version of SSH that includes SFTP, which is functionally similar to FTP, but is SSH2 encrypted. At Indiana University, UITS has upgraded its central systems to SSH2 (usually the OpenSSH version), and encourages those concerned with secure communications to connect using an SSH2 client.

For more information on SSH2 visit.

JSch is a pure Java implementation of SSH2.

JSch allows you to connect to an sshd server and use port forwarding, X11 forwarding, file transfer, etc., and you can integrate its functionality into your own Java programs.

For more information on JSch visit

To use SFTP in Java you have to include the jar  file in your classpath.
Below is a code snippet of SFTP in Java using JSch


import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;

public class SFTP {

    private String hostName;
    private String userName;
    private String password;

    public SFTP(String hostName, String userName, String password) {
        if (hostName != null && !hostName.isEmpty() && userName != null
                && !userName.isEmpty() && password != null
                && !password.isEmpty()) {
            this.hostName = hostName;
            this.userName = userName;
            this.password = password;

        else {
            throw new NullPointerException("Argument can not be null or empty...");

    public void copy(String source, String destination) {

        try {

            java.util.Properties config = new java.util.Properties();
            config.put("StrictHostKeyChecking", "no");

            JSch jsch = new JSch();

            Session session = jsch.getSession(this.userName, this.hostName, 22);


            String copyFrom = source;
            String copyTo = destination;

            Channel channel = session.openChannel("sftp");


            ChannelSftp sftpChannel = (ChannelSftp) channel;
            sftpChannel.put(copyFrom, copyTo);

        } catch (Exception e) {




Serialize and De-serialize java objects

In this post I will tell you about Java serialization and de-serialization.

Serialization is the process of storing the objects state (parameters) in a file, that can later be used.

De-serialization is the reverse process of serialization, means getting the saved state of an object.

Points to remember-

  1. Java provides an serializable ( interface. which is a marker interface that not contain any method declaration.  
  2. If you want to serialize an object of a class you must implements Serializable interface.
  3. use transient keyword to serialize only some part of the object. Suppose you does not want to some properties to be serialize during serialization process then mark these properties as transient.
  4. As static variable are the part of class they not take part in serialization

Here is an example how to serialize an object in Java -

package com.esc.test;


public class TestSerilizable implements Serializable {

    private static final long serialVersionUID = 1L;
    private int age;
    private String name;

    public TestSerilizable(int age, String name) {

        this.age = age; = name;


    public String toString() {

        return age + " " + name;


    public static void main(String[] args) throws IOException {

        TestSerilizable test = new TestSerilizable(11, "abhi");

        TestSerilizable test2 = new TestSerilizable(12, "rani");

        FileOutputStream fileStream = new FileOutputStream("resources/test.ser");
        ObjectOutputStream objectStream = new ObjectOutputStream(fileStream);

de-serializing the serialized object
        FileInputStream inputStream = new FileInputStream("resources/test.ser");
        ObjectInputStream objInputStream = new ObjectInputStream(inputStream);
        TestSerilizable deSerializeValue =(TestSerilizable)objInputStream.readObject();
        System.out.println("Deserialize value  age  = " + deSerializeValue.age);
        System.out.println("Deserialize value  name  = " +;

Now you get the value of your object after running the above program

Deserialize value  age  = 11
Deserialize value  name  = abhi

Mark some properties as transient that you do not want take part in serializtion

public class TestSerilizable implements Serializable {

    private static final long serialVersionUID = 1L;
    private int age;
    private transient String name; // not save in serialization

    public TestSerilizable(int age, String name) {

        this.age = age; = name;


    public String toString() {

        return age + " " + name;


Now run the de-serilization porogrma again

FileInputStream inputStream = new FileInputStream("resources/test.ser");
        ObjectInputStream objInputStream = new ObjectInputStream(inputStream);
        TestSerilizable deSerializeValue =(TestSerilizable)objInputStream.readObject();
        System.out.println("Deserialize value  age  = " + deSerializeValue.age);
        System.out.println("Deserialize value  name  = " +;

output of the program. in this out put you can see value of name property is null because we mark it as transient, so during serialization its value not persist in the serialization. null is the default value of name as it is a string,

Deserialize value  age  = 11
Deserialize value  name  = null

Removing stop words from Text using java

What is stop words?

Stop words are those words that frequently occur in a language and does not defined the relevance of an document against a user query. It also help us to keep the dictionary size less.

Example of stop word like is, am, the, where, your, you

Here I am using a standard stop list to remove these words from our text.

import java.util.Arrays;
import java.util.HashSet;

 * @author xyz version 1.0.0
public class StopWordRemoval {

    String[] stopWords = { "a", "about", "above", "across", "after", "again",
            "against", "all", "almost", "alone", "along", "already", "also",
            "although", "always", "among", "an", "and", "another", "any",
            "anybody", "anyone", "anything", "anywhere", "are", "area",
            "areas", "around", "as", "ask", "asked", "asking", "asks", "at",
            "away", "b", "back", "backed", "backing", "backs", "be", "became",
            "because", "become", "becomes", "been", "before", "began",
            "behind", "being", "beings", "best", "better", "between", "big",
            "both", "but", "by", "c", "came", "can", "cannot", "case", "cases",
            "certain", "certainly", "clear", "clearly", "come", "could", "d",
            "did", "differ", "different", "differently", "do", "does", "done",
            "down", "down", "downed", "downing", "downs", "during", "e",
            "each", "early", "either", "end", "ended", "ending", "ends",
            "enough", "even", "evenly", "ever", "every", "everybody",
            "everyone", "everything", "everywhere", "f", "face", "faces",
            "fact", "facts", "far", "felt", "few", "find", "finds", "first",
            "for", "four", "from", "full", "fully", "further", "furthered",
            "furthering", "furthers", "g", "gave", "general", "generally",
            "get", "gets", "give", "given", "gives", "go", "going", "good",
            "goods", "got", "great", "greater", "greatest", "group", "grouped",
            "grouping", "groups", "h", "had", "has", "have", "having", "he",
            "her", "here", "herself", "high", "high", "high", "higher",
            "highest", "him", "himself", "his", "how", "however", "i", "if",
            "important", "in", "interest", "interested", "interesting",
            "interests", "into", "is", "it", "its", "itself", "j", "just", "k",
            "keep", "keeps", "kind", "knew", "know", "known", "knows", "l",
            "large", "largely", "last", "later", "latest", "least", "less",
            "let", "lets", "like", "likely", "long", "longer", "longest", "m",
            "made", "make", "making", "man", "many", "may", "me", "member",
            "members", "men", "might", "more", "most", "mostly", "mr", "mrs",
            "much", "must", "my", "myself", "n", "necessary", "need", "needed",
            "needing", "needs", "never", "new", "new", "newer", "newest",
            "next", "no", "nobody", "non", "noone", "not", "nothing", "now",
            "nowhere", "number", "numbers", "o", "of", "off", "often", "old",
            "older", "oldest", "on", "once", "one", "only", "open", "opened",
            "opening", "opens", "or", "order", "ordered", "ordering", "orders",
            "other", "others", "our", "out", "over", "p", "part", "parted",
            "parting", "parts", "per", "perhaps", "place", "places", "point",
            "pointed", "pointing", "points", "possible", "present",
            "presented", "presenting", "presents", "problem", "problems",
            "put", "puts", "q", "quite", "r", "rather", "really", "right",
            "right", "room", "rooms", "s", "said", "same", "saw", "say",
            "says", "second", "seconds", "see", "seem", "seemed", "seeming",
            "seems", "sees", "several", "shall", "she", "should", "show",
            "showed", "showing", "shows", "side", "sides", "since", "small",
            "smaller", "smallest", "so", "some", "somebody", "someone",
            "something", "somewhere", "state", "states", "still", "still",
            "such", "sure", "t", "take", "taken", "than", "that", "the",
            "their", "them", "then", "there", "therefore", "these", "they",
            "thing", "things", "think", "thinks", "this", "those", "though",
            "thought", "thoughts", "three", "through", "thus", "to", "today",
            "together", "too", "took", "toward", "turn", "turned", "turning",
            "turns", "two", "u", "under", "until", "up", "upon", "us", "use",
            "used", "uses", "v", "very", "w", "want", "wanted", "wanting",
            "wants", "was", "way", "ways", "we", "well", "wells", "went",
            "were", "what", "when", "where", "whether", "which", "while",
            "who", "whole", "whose", "why", "will", "with", "within",
            "without", "work", "worked", "working", "works", "would", "x", "y",
            "year", "years", "yet", "you", "young", "younger", "youngest",
            "your", "yours", "z" };

    String[] words = { "Ram", "is", "a", "good", "boy", "boy", "the", "where",
            "your", "yours", "girl", "girl", "ram", "the", "at", "at", "on" };

    public void removeStopWord() {
        HashSet<String> wordWithStopWord = new HashSet<String>(
        HashSet<String> StopWordsSet = new HashSet<>(Arrays.asList(stopWords));

    public static void main(String[] args) {
        StopWordRemoval stpRemove = new StopWordRemoval();


without stop words = [Ram, girl, ram, boy]

Getting unique words and word frequencies for a give array in JAVA


 * This class provide basic utility function for word.
 * @author xyz
 * @version 1.0.0

import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;

public class WordUtil {

     * return the size of array
     * @param words
     * @return int
    public int getSize(String[] words) {
        int size = words.length;

        return size;

     * Provides unique word in a string
     * @param words
     * @return Set
    public Set<String> getUniqueWords(String[] words) {

        Set<String> uniqueWords = new HashSet<String>();
        for (String word : words) {

        return uniqueWords;

     * Provide the word frequencie of the words given in string
     * @param words
     * @return Map
    public Map<String, Integer> getFreuency(String[] words) {

        Map<String, Integer> wordsFrequencies = new HashMap<>();

        for (String word : words) {
            Integer index = wordsFrequencies.get(word.toLowerCase());

            wordsFrequencies.put(word.toLowerCase(), (index == null) ? 1 : index + 1);


        return wordsFrequencies;

    public static void main(String[] args) {

        String[] words = { "Ram", "is", "a", "Ram", "good", "boy", "boy",
                "girl", "girl","ram","the","at","at","on"};
        WordUtil wordFrequency = new WordUtil();


Changing The default Root Document Directory in Apache Web Server

When we install the Apache Web Server on a System, It set the Default Root Document Directory to "/var/www/html" OR "/var/www", depends on the version, to serve the user request.

One can check the default Root Document Directory by checking the file "000-default.conf" under the directory "/etc/apache2/sites-available" and search the "DocumentRoot" inside the file "000-default.conf".
DocumentRoot /var/www/html  # This is the Default Root Document Directory

Sometime We want to change the default Root Document Directory to some other directory.

Suppose we want to share some music file with our friends using http and our music file resides inside the directory "/home/techieknowledge/Music". This directory contains my collection of music, and I want to share it.

To make this possible we want Apache web Server to serves  files from the "/home/techieknowledge/Music" instead of Default Root Document directory which in this case is "/var/www/html".

To chang the Default Directory we follow below step.
Step 1. Go to Directory "/etc/apache2/sites-available"

$ cd /etc/apache2/sites-available

Step 2. Open the file "000-default.conf"  and Replace the /var/www/html to "/home/techieknowledge/Music"

Step 3. Now go to /etc/apache and open the file "apache2.conf"

and add folowing  give line to "apache2.conf"

<Directory /home/techieknowledge/Music>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted

Step 4. Now restart the Apache

$ sudo /etc/init.d/apache2 restart

Now type http://localhost/  on
your browser and you will see listing of your Music files and directory  in your browser.

Now you can share your file with your friends.

Apache HTTP web server

Apache is an HTTP server. The HTTP server is a piece of software that understand URLs (web page addresses) and HTTP protocol.

In this tutorial I will tell you, How to install and configure the Apache server.

Apache is also known as httpd (Http daemon).

A daemon is a process that is continuously running as background process.

Installing Apace in Ubuntu

You must have administrative privilege

Step 1.  Update the system using the following command

$ sudo apt-get update
$ sudo apt-get install
Step 2. Now type the below command to install apache 

$ sudo apt-get install apache2

Now your apache web server is installed.

apache by default runs on port number 80.
To check you can use nmap command

type nmap localhost it will show you port 80 as in given screen shot

This will shows your apache httpd installed successfully.

Now you type the localhost on your browser address bar. It will show you apace default page.

Now you are ready to run your application using apache.

Apache configuration files

Apache2 is configured by placing directives in plain text configuration files.

Here we show some main configuration files --

go to cd /etc/apache2

apache2.conf : This is the  main Apache2 configuration file. This file Contains settings that are global to Apache2. It puts the pieces together by including all remaining configuration files when starting up the web server.

ports.conf : is always included from the main configuration file. It is supposed to determine listening ports for incoming connections which can be customized anytime.

Document root directory :  /var/www/html or /var/www

Where are the Apache log files?

By default, Error Log files of Web Server /var/log/apache2/error.log

Access Log files of Web Server: /var/log/apache2/access.log

Start, Stop and Restart Apache

After you have installed Apache, it will be added to the init.d list and will auto start whenever you boot up your computer. The following commands allow you to start, restart, stop Apache.

$ sudo /etc/init.d/apache2 start   #start apache
$ sudo /etc/init.d/apache2 stop   #stop apache
$ sudo /etc/init.d/apache2 restart   #restart apache

Changing the default port of apache

As we know that by default apache listen request to 80 port. some time there may be need to change default port number. 

steps to change the default port

Step 1. Open /etc/apache2/ports.conf

$ sudo vim /etc/apache2/ports.conf
search for 'Listen' 

and change 80 to your choice of port number say we want to run apache in port 8010

so we just edit Listen 80 to Listen 8010

Step 2 : Restart the apache

$ sudo /etc/init.d/apache2 restart

Now type in browser http://localhost:8010/ to check your changes.