Java-for-beginners

E-mail Print PDF
Article Index
Java-for-beginners
first code
code 1 explanation
Executing the Program
Fetch Data from NCBI
NCBI fetch reference
All Pages

How to Write a Java Program for Bioinformatics Applications: - A Manual

How to Write a Java Program for Bioinformatics Applications: - A Manual

By Jitesh B Dundas

This manual is a self-help guide for doing programming, especially in Java.

 

 

I learnt this way of looking at programming from my MCA Java guest faculty Prof. Lele, so my thanks to him.

It is recommended that you read the following sources for the Java language before or while reading this manual.

Java: Complete Reference by Herbert Schildt

The Sun Java Tutorial (online at www.java.sun.com)

JavaScript / HTML tutorial (online at www.w3schools.com )

Inside Servlets by Justin Callaway

Java Server Faces Technology at www.java.sun.com

In order to do write a Java Program, we need to do take care of the following:-

1) A blueprint of the actual requirements with the expected input and the expected output

2) A series of steps that explain the logic to be used to implement the requirements of Step-1.

3) Set of function calls using the Java language API to set the language

A Java Program is nothing but a set of function calls to the Java language API. It is all about calling the right functions of Java that will satisfy our logic for getting our results.

So let us assume that you need to write a Java Program to fetch data from EMBL String database. How do we go about getting this desired result?

The first thing to do is to get the requirements very clearly. Here, our requirement is to get data from EMBL String database. Thus, we need to understand:-

1) What are the methods or ways available in the database source to fulfill the requirements?

Well, we know that the EMBL String database (referred to as db in future here) has a programmable API that allows us to fetch data from the database in real-time. Thus, if we just make use of the correct function calls to the correct URL (with correct parameters and input values); we should be able to get the desired results from the database.

On visiting the String EMBL API, we find that it has an API in the Help/Info section on the website. There is a documentation available explaining how to use the database for fetching data from your Java program. Thus, we first get the URL and the parameters to send the request to the database.

Thus, we read for the URL and reach to the paragraph where they have mentioned the URL parameters.

Try to make the URL using the parameters presented for the use by the API. This can be done in our program by adding parameters as needed. You could add the input values based on the normal use of database and see if you get the same results programmatically as well.

For e.g. we need to find the list of protein interactors for the p53 protein. On the basis of the URL presented, besides the keywords

http://string-db.org/api/psi-mi-tab/interactionsList?identifiers=p53&required_score=900

In the above e.g., the search term is p53 protein with a required score of 900. Look up the API of the String db and check which all parameters you can add.

2) How do we actually do this using the methods available?

Ok, now we have the parameters and the URL to be used. So how do we use all this for our program. Well, first we need:-

2.1) The logic for the program

2.2) The function calls and statements in Java for implementing each line of the logic

2.3.) The correct and validated set of inputs from the user as well as the correct format of the total input for the String db.

Right, now we have the steps in place. Let us proceed towards writing the program.



So what is the logic of the program? We need to find out the steps in which we can send the data from the user, in the format requested by the String db. Thus, we need to understand how Java communicates to an external URL for sending and receiving input. Next, you understand the flow of the logic and then replace it with Java statements to actually execute the same.

Thus, we have the following steps of logic for sending and receiving data from EMBL.

1) Get the input from the user

2) Create the URL for sending the data to the db

3) Get the proper authentication for sending data via the internet to the db

4) Send the input data to the String db API via the internet

5) Read the response that is received.

6) Print the output into the format that the user understands.

7) End the execution

Writing the logic helps in implementing the java program faster.

So now, we write the steps for implementing the logic mentioned above.

The following is the Java program that will actually send the request and get the response from the String EMBL.

1) Get the input from the user

This will be a simple html form in which we ask the user to input the parameters like search terms and required score. This will be just like the String EMBL system form that we see on the homepage of the latter. Please refer to the file ImportFromEMBLDb.jsp (ProteomDb project folder) for further details.

2) To Step -7) are implemented in the program below.

ImportFromEMBLPI.jsp

The comments are mentioned in ‘//’ to explain the logic that has been used here. They are not a part of the actual code. To get the working code, goto the ProteomDb project folder and find the file by this name.

Let us walk make the code step by step.

1) Open any HTML Editor or Java IDE (for beginners, simple Notepad or the former is preferred. Later on, you can use Eclipse or Netbeans IDE)

2) Save the file as ImportFromEMBLPI.jsp (you can give any other name as you want). Next, we add the java libraries that are to be needed for our code. You need to read the API to specifically understand which APIs will be needed for your proposed program.

3) Next, we mention the try-catch block and define the session. The session object is needed to hold variables across pages and limit the duration of any Java activity (check Java tutorial and API for further details).

4) Define the URL string by taking user input from the input form.

5) Define the System Properties. These are values that are stored in a file that contains information to log into the internet via your home network.

6) Define the URL object to hold the URL string. Next create a file name using the Random class object integer value.

7) Now define two handles to write to:-

BufferedReader in – this is to print the output from the NCBI server and get it to the Java Program handle

8) Declare the variables to hold the values.

9) Write the while loop to actually print the content to the browser. You could store it in an array and display it in a tabular form.

First we get the while loop which reads the output from NCBI line by line. Then we use string handling functions to print the different parts of the array. The EMBL text output is in text format with tab delimiters between them. Using string functions, we can parse the output and get the actual column values.

Write the while loop to actually print the content to the browser. You could store it in an array and display it in a tabular form.

9) Now we try to keep a counter for managing the number of records. Again, we close the loops and print the values. Lastly, we store the URL as a session attribute. You could redirect the page to another welcome page if wanted using the response.sendRedirect command.

 


 

Code Explanation:-

The first step is to declare that this is a JSP file and that we are going to import and use java API functions for our program below.

//declare that this is a java program

<%@ page language = "java"%>

//import all the java api libraries

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import= "javax.xml.parsers.*" %>

<%@ page import= "org.w3c.dom.*" %>

<%

//java program begins.

System.out.println("ImportFromEMBL3.jsp");

//create an http session

javax.servlet.http.HttpSession hs = request.getSession();

try //the try catch block starts here

{

//this is the base URL of the String EMBL db. Store it in a variable

String URLString = http://string-db.org/api/psi-mi-tab/interactionsList?identifiers=+ request.getParameter(“txtSearchWords”) + “& required_score=” + request.getParameter(“txtReqScore”);

//store the authentication settings for connecting to the internet

Properties systemSettings = System.getProperties();

systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in"); //proxy host

systemSettings.put("http.proxyPort", "80"); //proxy port

//time out settings for connecting to the internet.

systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");

systemSettings.put("sun.net.client.defaultReadTimeout", "10000");

//the user id and password to connect to the internet via the home network.

Authenticator.setDefault(new Authenticator()

{

protected PasswordAuthentication getPasswordAuthentication()

{

return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur user name password of iitb login

}

});

System.setProperties(systemSettings); //set the user id and settings

System.out.println(" System Properties Set");

System.out.println(" URL variables set="+URLString);

URL yahoo = new URL(URLString); // store the URL in the a URL class object

//create the file name for storing the output from the String db.

Random rn = new Random();

int rnval = rn.nextInt() ;

String fname = "file_embl_pi_"+ String.valueOf(rnval) + ".xml" ;

File file = new File(fname);

String tempstr = "";

//create the handle to the stream that sends the input URL to the db using the stream created and then starts reading the output data from the db.

BufferedReader in = new BufferedReader( new InputStreamReader(yahoo.openStream())); //read the file

//define the variables for reading and writing the data

String inputLine;

String temp = "";

String taxonId = "";

String arrVal[] ;

String arrVal2[] ;

String arrVal3[] ;

int reccnt = 0;

String tscore = "";

String score = "";

String escore = "";

String dscore = "";

String temp2 = "";

//Print to the browser that the output display starts.

out.println("<b> List of Protein Interactors for your query </b><br>");

out.println("<b>--------------------------------------------------------</b> <br>");

//this loop reads the content of the output line by line

while ((inputLine = in.readLine()) != null)

{

//Here we just parse the data received and present it in a readable format.

out.println("<b>Record No:-"+ ( reccnt+ 1) + " </b> <br>");

tempstr = tempstr + inputLine ; //store the line data in another variable.

temp = inputLine ;

arrVal = temp.split("\t"); //split the line content at each tab space in the line. Here we assume that we know the format of the output coming from the db, which is in a continuous set of words, each separated by tab space.

for( int i=0;i< arrVal.length; i++) //loop to find all the column values.

{

if (i == 0) //if this is the first column of the line i.e. the start of the output line

{

//the array has stored the data in each variable. Thus, we now just present the data to the user on the browser.

out.println("String ID:"+ arrVal[i] +"<br>"); //data at array position 0

out.println("NCBI ID:"+ arrVal[i+1] +"<br>");//data at array position 1

out.println("Preferred Name:"+ arrVal[i+2] +"<br>");//data at array position 2

//data at array position 3

out.println("String DB Name:"+ arrVal[i+3] +"<br>"); //data at array position 9

out.println("String Db Taxonomy ID:"+ arrVal[i+9] +"<br>");

//data at array position 10

out.println("NCBI taxonomy ID:"+ arrVal[i+10] +"<br>");

//data at array position 14

out.println("Score String:"+ arrVal[i+14] +"<br>");

//sometimes the value in an array position is again a string of data. Thus, we need to split the data in the same manner and use it to

arrVal3 = arrVal[i+14].split("\\|"); //split the array at each ‘\’ character

for( int j=0; j < arrVal3.length; j++) //loop through all the elements

{

out.println( "" + arrVal3[j] +" <br>"); //print data at the index position

}

}

}

//loop ends here

reccnt++; // variable to count no. Of records.

out.println("<b>--------------------------------------------------------</b> <br>");

}

in.close(); //close the stream

//store the URL String

hs.setAttribute("urlstring",URLString);

System.out.println( "urlstring="+URLString); //print the file name to tomcat log

} //try block ends

catch(Exception ex) //catch block to handle any error or exception that may occur

{

out.println("Exception->"+ex);

//get the handle to the write the error to the browser.

PrintWriter pw = response.getWriter();

ex.printStackTrace(pw); //print the details of the error to the browser

} //code ends

%> //this means that the java program ends

So the complete code is shown to you with the logic. Notice how easy it becomes to actually write the code if the logic is clear from the start.



Executing the Program

Now we know that the program is working fine (trying to be confident, but this hardly happens. You will need to debug and improve you code). Thus, you have to place the file in the webapps folder of your tomcat engine (assuming that you are using Apache Tomcat 5.5 engine with Java SE 5 and MySQL 5.0 on the Windows XP OS). Next open any browser window (after starting the tomcat engine service).

Type the path of your input html file (this file will get the input from client and then send the data to the java program, which will get the output from the db). Thus, here we have http://localhost:8080/ProteomDb/ImportFromEMBLDb.jsp .Type this in the browser and input the parameters in the resulting client page. Press the button and you will get a list of protein interactors on the resulting window.

Debugging or handling errors

Again, it is easy to get involved with an error, especially in programming. Thus, this try/catch block helps a lot in such cases. There is no standard way of solving errors in your code. However, you can save a lot of time by:-

1) Knowing the flow of the program. Are you sure that your program is following the correct logic.

2) Are you sure that the parameters are correctly defined?

3) It is good to print statements at each logic sub-unit of the code so that we can actually track if the program is executing fine or not. In case there is any error, you will know at which point the error has occurred.

4) Check the stack trace that is printed by Java. Mostly, it will give you the name of the exception. Just Google on the term or look in the Java API documentation to find out what does the error mean. Does some background check to solve the problem on the internet and you should get the solution to the problem.

5) Check if the internet is connected and that all the support environment requirements are properly running. For e.g.) The internet may not be connected and it may give you a ConnectTimeOut exception. Just google on the term ‘ConnectTimeOut’ and you will get a list of possible answers. One of the most common reasons is that the internet is not working properly or the input settings for connecting to the internet is wrong or missing.

6) In case you see a statement in the stacktrace, it will also show you the line no. Where this error occurred. Check the syntax of the statement in the API documentation or in the books to correct the code.

7) You must read and understand the Java language properly to be able to write a good program. Reading the above resources once is not enough. Practice is the key, for new beginners as well as experts. Everybody faces problems in coding and the more you know the language, the better will be you resulting program.



Fetch Data from NCBI

I am assuming that you are aware of the terms URL (Uniform Resource)

The following are the four steps in which data is fetched from NCBI

1) Take input from user.

Fetching data from NCBI uses the Entrez programming utility.

In this step, we present an HTML page to the user which will take the necessary information from the user. The parameters of the request URL (in this case NCBI Entrez URL) that is to be used is set as in the input field. For e.g., if one of the parameters to be added is the search words, then we will add a field in the input form called “Search Term”. This field will take in the search terms that the user enters.

A snapshot of the form is shown below.

In this step, we first create an interface (or a user page) to allow the user to enter the input parameters for the page. For e.g. in the functionality to enter the NCBI details, we need to ask the user to enter the search terms (if you have seen the URL values in the Entrez Programing Utility, you will find that there is a field called “term”). Thus, we will need to create a URL in Entrez format based on the input parameters provided.

On looking at the Entrez utility to fetch a list of records for a particular search term, we use the eSearch method. This eSearch method is the base URL that you will use to fetch a list of records from NCBI database. The URL is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? Please check the Entrez utility for further details. Now, the next task is to find the set of parameters that are to be sent along with this URL.

In the input page, there are two field defined.

1) Database Name: - This is to tell the NCBI server which database is to be searched.

2) Search Term: - This is to tell the NCBI server the search terms for which the records are to be returned.

Look up the Entrez utility page on NCBI. There in the eSearch page, you will find the key parameters. For the database name, the keyword will be “db” and for search term, the keyword is “term”.

Again, we need to tell the server how many records are to be returned. By default, NCBI will return a maximum of 50 records. Also, you have to tell NCBI which page is to be returned. Say for e.g. there are 12000 records found for your search term. NCBI returns 50 records per page, and then it will return 12000 / 50 = 240 pages. Thus, you will need to define which page is to be returned right?

The values for this 2 parameters are called “retmax” and “retstart”. These parameters are pre-decided by us and the client does not control these values. Here, we have kept the value of “retmax” as 50 and “retstart” as the page no. for the search page.

OK, now you have all the parameters along with the base URL. Thus, you need to send this URL to the second program (that will actually send the request to NCBI and get the output).

In order to make this page, you just need the simple HTML editor and some basic HTML/JavaScript programming. I have attached a simple code file for your reference. You might want to look at www.w3schools.com for further details on learning them.

In the form presented above, the user enters the keyword values and then presses the “Submit” button. This sends the details of the page to the second program (explained in Step-2)

2) Send request and response from NCBI

Now we have received the user input. Assume that the user has entered the keyword “cancer” search term and the database is the “genome”. Now we need to present the NCBI server with the input in the specific Entrez format and then send the request.

OK, the URL that will be created is something like:-

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=cancer

We need to pass this URL to the Java program here which will send it to the NCBI server. Please note that we are using the same URL pattern to get the different pages from the resulting output page. For e.g. once the first 50 records are shown to the user (Step-4), the user may want to browse the next or previous pages too. For this, we need to send a request to this same Java program. Again, how do we know if the request is from the client page or from the output page. For this, we have used the variable “loopflag”. For requests from client page, loopflag = no.

Snapshots of the program.

1) First include the libraries.

2) Start the try catch block and

Declare the session variable and start the try-catch block.

3) Define the System properties for connecting to the internet from the home network.

4) Now check if this file has been called from the client or from the output file.

We have placed a flag and check if its value. If it is from the client side, then it’s value will be no. Else, it will be next or previous as per the value sent from the output file.

5) Set the connection to the URL of the NCBI server for the database. Note that the methods have been set for the connection. The connection object will connect to the NCBI URL and is set to accept input, send output with no cache storage and no user interaction. This connection will also accept text/xml output.

6) Declare the variables. Create an object of BufferedReader to read the input from the NCBI server.

7) Now we use the Random class object to create the file name. We create the BufferedWriter class object to write the contents to a file. Next, we create a

while loop to actually read each content line by line and write the content to the XML file present in the tomcat home directory.

8) Now we try to close the loop and the objects that we created. Next, we store the file name and the variables to the session. Next, we print the values and then redirect the page to parse the XML file.

Check the xml file that was generated to view the output.

Hope this explains the code. The while file is mentioned below

 


 

Here is the Java program for our reference:-

File Name:- ImportFromNCBI3.jsp

// comments are entered in java code in this format.

//define the language for this page as Java

<%@ page language = "java"%>

//import the libraries for java

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import= "javax.xml.parsers.*" %>

<%@ page import= "org.w3c.dom.*" %>

<%

//print the message to the log file of Tomcat engine.

System.out.println("ImportFromNCBI3.jsp");

//create the session variable

javax.servlet.http.HttpSession hs = request.getSession();

try //start the try-catch block.

{

String URLString = ""; //variable to hold the URL

//store the authentication settings for connecting to the internet

Properties systemSettings = System.getProperties();

systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in"); //proxy host

systemSettings.put("http.proxyPort", "80"); //proxy port

//time out settings for connecting to the internet.

systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");

systemSettings.put("sun.net.client.defaultReadTimeout", "10000");

//the user id and password to connect to the internet via the home network.

Authenticator.setDefault(new Authenticator()

{

protected PasswordAuthentication getPasswordAuthentication()

{

return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur user name password of iitb login

}

});

System.setProperties(systemSettings); //set the user id and settings

System.out.println(" System Properties Set");

System.out.println(" URL variables set="+URLString);//press the URL

String retstart = "";//variable to set the page number

String retmax = "";//variable to set the maximum records per page

/*Is the request coming from the client or from the output file. Is it a different page being requested for the same search request or is it a new request coming from the client. If the latter is true, then the variable loopflag = no, else it will be false. This variable loopflag is set in the client input form and the value received from there to this program.

*/

if (request.getParameter("loopflag").toString().equals("no") )

{//this is a page from the client input.

//the URL data is sent from the previous input. The javascript converts the

//client input into NCBI format URL and then sends the details via a text

//field named “txtURLString”. Only the parameters retmax and retstart are

//added here

URLString = request.getParameter("txtURLString").toString();

System.out.println("loopflag=no");//print message:- the loopflag value is no

retstart = request.getParameter("retstart") ; //get the value of retstart from client

retmax = request.getParameter("retmax");//get the value of retmax from client

//print the values of the parameters to the apache tomcat log file.

System.out.println("loopflag=else part");

System.out.println("retstart="+retstart);

System.out.println("retmax="+retmax);

System.out.println("url_IMPORTFROM="+request.getParameter("txtURLString"));

//contruct the final URL and store in the variable

//set the value of page number in session variable

URLString = (String)request.getParameter("txtURLString")+ "&retstart=" + retstart + "&retmax=" + retmax;

//set the value of no. of records in session variable

//value of retstart and retmax stored in the variable

hs.setAttribute("retstart",retstart);

hs.setAttribute("retmax",retmax);

}

else if ( request.getParameter("loopflag").toString().equals("next") )//loop

{

System.out.println("next loopflag="+hs.getAttribute("next"));

URLString = (String)hs.getAttribute("next");

//database name stored in the session variable.

hs.setAttribute("dbname", hs.getAttribute("dbname") ); //database

System.out.println("loopflag=next"); //next page is being clicked.

}

else if ( request.getParameter("loopflag").toString().equals("previous") )

//loop input if loopflag is from the ouput page in which the previous page

//has been requester i.e. the user has clicked on the “previous” page of the

//output page.

{

//previous page is being clicked.

System.out.println("previous loopflag="+hs.getAttribute("previous"));

//database name is being set in to the session variable.

hs.setAttribute("dbname", hs.getAttribute("dbname") );

//previous page value being set into the session variable.

URLString = (String)hs.getAttribute("previous");

System.out.println("loopflag=previous");

}

else

{ //none of the above cases were executed.

retstart = request.getParameter("retstart") ;

retmax = request.getParameter("retmax");

System.out.println("loopflag=else part");

System.out.println("retstart="+retstart);

System.out.println("retmax="+retmax);

System.out.println("url_IMPORTFROM="+request.getParameter("txtURLString")); //print the URL string

//store the variables in session..

hs.setAttribute("retstart",retstart);

hs.setAttribute("retmax",retmax);

hs.setAttribute("dbname", hs.getAttribute("dbname") );//database name

}

System.out.println(" URL variables set="+URLString);

URL url = new URL(URLString); //url string taken from user input.

//define the HTTPURLConnection object

HttpURLConnection connection = null;

//open the connection to the following URL.

connection = (HttpURLConnection) url.openConnection();

//set the connection type to POST method.

connection.setRequestMethod("POST");

connection.setDoInput(true);//allow the user to give input

connection.setDoOutput(true);//allow the connection to give output

connection.setUseCaches(false);//don’t use cache to store temporary results.

connection.setAllowUserInteraction(false); //don’t allow user interaction for this //connection

//set the content type of output to XML/text format. This means that the output will //be of text or XML format.

connection.setRequestProperty ("Content-Type","text/xml; charset=\"utf-8\"");

//print message. This will help in tracking the errors in tomcat log file.

System.out.println(" connection Set");

//set the channel to get the input. This is called the input stream. We use the //buffered reader for getting the input. Open the input stream using the buffered //reader.

BufferedReader in = new BufferedReader( new InputStreamReader( connection.getInputStream()));

String decodedString;//variable to hold the decoded string

String tempstr = ""; //variable to store the single output

System.out.println("Reader Set");//message – stream set

Random rn = new Random(); //random object to get a random integer

int rnval = rn.nextInt() ; //get a random integer.

//create the XML file name

String fname = "file_ncbi_"+ String.valueOf(rnval) + ".xml" ;

//a channel to handle the output from the NCBI and then write it to the XML file specified above.

BufferedWriter bw = new BufferedWriter(new FileWriter(fname));

//loop through all the records line by line

while ((decodedString = in.readLine()) != null)

{

tempstr = tempstr + decodedString;// append the output line to the existing o/p

//out.println(tempstr);

bw.write(decodedString); //write the output line to the XML file.

if ( tempstr.indexOf("/") == -1 ) //find if there is a new line started

{

bw.newLine(); //if yes, then start on a new line

}

}

System.out.println(" output given.Set"); //message – output set.

bw.close(); //close the streams

in.close();

//store the file name and URL in session variable. This value will be passed onto //the next program in Step-3)

hs.setAttribute("NCBIfile",fname.toString()); //store the file name in session //variable

hs.setAttribute("urlstring",URLString);

//print the values of the variables.

System.out.println("Session variable Set");

System.out.println( "fname="+fname.toString() );

System.out.println( "urlstring="+URLString);

//move the control to the next page.

response.sendRedirect("/ProteomDb/FetchDataFromNCBI.jsp");

}

catch(Exception ex) //catch any exceptions

{

out.println("Exception->"+ex); //print the exception

PrintWriter pw = response.getWriter(); //get the handle to the output stream for //this JSP page

ex.printStackTrace(pw); //print the error stack details to the JSP page

}//try catch block ends.

%>

3) Parse the output response from NCBI

So now you have the output data from the previous program. The problem is that this data is in XML format. In an XML file, data is present in the form of tags. The value will be present between the tags. We need to extract this information between each tag. This is what the current Java program does.

Please note that the name of the XML file is needed here. We assume that we are having the XML file at present. I have attached a snapshot besides each code shunk to explain how it looks

<%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%>

<%@ page language = "java" // set the page language to java %>

//import the java libraries

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import = "java.sql.*" %>

<%@ include file = "header.jsp" %>

<%@ page import = "javax.xml.parsers.*" %>

<%@ page import = "org.w3c.dom.*" %>

<%

try

{

//create the object for session.

javax.servlet.http.HttpSession hs = request.getSession();

//get the name of the file that was stored in the session variable.

String fname = (String) hs.getAttribute("NCBIfile");

System.out.println("fname = " + fname.toString() ); //print the file name.

File file = new File(fname); //create the file name

//create the DocumentBuilderFactory class object. This will be the handle to //the XML file. Create a document object and map it to the XML file.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.parse(file); doc.getDocumentElement().normalize();

//get the root element and print its value to the tomcat node.

System.out.println("Root element " + .getDocumentElement().getNodeName());

//total no. of records retrieved in the search

//get the NodeList Object and get the handle to the tag – Count .

NodeList nodeLst_cnt = doc.getElementsByTagName("Count");

//get the first item for the tag Count.

Element fstNmElmnt_cnt = (Element) nodeLst_cnt.item(0);

//get all the child nodes for the tag Count.

NodeList fstNm_cnt = fstNmElmnt_cnt.getChildNodes();

System.out.println("Document Count : " + ((Node) fstNm_cnt.item(0)).getNodeValue());

//get the node value at first child node and print the value to the Tomcat log file.

int rec_cnt = Integer.valueOf( ((Node) fstNm_cnt.item(0)).getNodeValue() );

if (rec_cnt != 0 ) //check if the no. Of records is > 0 or not.

{

System.out.println("rec_cnt not 0");//print the message to tomcat log.

//Ret Max variable. Return no. of records to be shown per page. here it is 20.

//Now we try to get the value for the tag RetMax in the same way.

NodeList nodeLst_retmax = doc.getElementsByTagName("RetMax");

Element fstNmElmnt_retmax = (Element) nodeLst_retmax.item(0);

NodeList fstNm_retmax = fstNmElmnt_retmax.getChildNodes();

System.out.println("Document RetMax : " + ((Node) fstNm_retmax.item(0)).getNodeValue());

int rec_max = Integer.valueOf( ((Node) fstNm_retmax.item(0)).getNodeValue() );

//Now we try to get the value for the tag RetStart in the same way.

NodeList nodeLst_retstart = doc.getElementsByTagName("RetStart");

Element fstNmElmnt_retstart = (Element) nodeLst_retstart.item(0);

NodeList fstNm_retstart = fstNmElmnt_retstart.getChildNodes();

System.out.println("Document RetStart : " + ((Node) fstNm_retstart.item(0)).getNodeValue());

int rec_start = Integer.valueOf( ((Node) fstNm_retstart.item(0)).getNodeValue() );

//The node below is IdList. This tag contains a list of all the record id tags for the search results. Thus, this will be a loop and have multiple child nodes.

NodeList nodeLst = doc.getElementsByTagName("IdList");//get the handle to tag

System.out.println("Information of all ids WITH IdList Length=" + nodeLst.getLength() ); //print the number of tags present for IdList.

//the tag IdList has several child nodes called Id. We need to first count the number of tags and then parse each of the tags.

for (int s = 0; s < nodeLst.getLength(); s++) //loop through all the “Id” node.

{

Node fstNode = nodeLst.item(s); //get the handle to the node at position s.

System.out.println("in first for loop<br>"); //print the for loop.

//check if the node is of the type ELEMENT

if (fstNode.getNodeType() == Node.ELEMENT_NODE)

{

System.out.println("in first if condition<br>");

Element fstElmnt = (Element) fstNode; //Assign the node

//get the element to the node Id.

NodeList fstNmElmntLst = stElmnt.getElementsByTagName("Id");

//GET The string array to store the Ids.

String[] pubmedids = new String[fstNmElmntLst.getLength()];

for (int h = 0; h < fstNmElmntLst.getLength(); h++) //gets all the Id //tag nodes

{

System.out.println("in second for loop<br>");

//get the handle to each element of the node at position h.

Element fstNmElmnt = (Element) fstNmElmntLst.item(h);

NodeList fstNm = fstNmElmnt.getChildNodes();//get the child //nodes.

pubmedids[h] = (String) fstNm.item(0).getNodeValue() ;//store //the value in array.

//print the node value.

System.out.println("Id : " + ( (Node) fstNm.item(0)).getNodeValue() + "<br>");

System.out.println("pubmedids@h=:" + h + "=" + pubmedids[h] + "<br>");

}

//store the variables in the session variable.

hs.setAttribute("pubmedids", pubmedids);

hs.setAttribute("dbname","genome");

hs.setAttribute("retstart", String.valueOf(rec_start) );

hs.setAttribute("retmax", String.valueOf(rec_max) );

hs.setAttribute("retcnt",String.valueOf(rec_cnt));

//print the values of the variables. Next close the loop.

System.out.println("retstart="+rec_start);

System.out.println("retmax="+rec_max);

System.out.println("rec_cnt="+rec_cnt);

}

}

//redirect to the next page to show the output. If there were no records //returned, then redirect to the first client page and ask him to enter the //details again.

response.sendRedirect("/ProteomDb/ShowDataFromNCBI.jsp");

}

else

{

System.out.println("rec_cnt=0");

//if the error is found then redirect to the client page with //message.

response.sendRedirect("/ProteomDb/ImportFromNCBIDb.jsp?noresultsflag=true&db");

}

//close the try-catch block. If there is any error present, then the description of the //same should be shown.

}

catch (Exception e) //catch the exception.

{

e.printStackTrace(); //print the stack trace

}

finally

{}

%>

4) Display the formatted output to the user

Create a simple HTML page using any HTML editor or write the code yourself. Screenshots below.

Then you can

Right, now we have got the output in the text formatted. This output is in the form of a String array. Now, we need to just parse the loop through each of the elements of the array and display it to the user in a tabular format.

This is a very simple code and should be self-explanatory. Here we just get the values from the session variable and use them to present the code for each database. Please remember that this program allows you to browse through all the NCBI databases and thus many categories for each case will be present.

I suggest you go through the tutorials and references mentioned above in the manual before reading the code.

<%@ page language = "java" %>

<%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%>

<%@ page language = "java"%>

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import = "java.sql.*" %>

<%@ include file = "header.jsp" %>

<%@ page import = "javax.xml.parsers.*" %>

<%@ page import = "org.w3c.dom.*" %>

<%

javax.servlet.http.HttpSession hs = request.getSession();

String pubmedids[] = (String[]) hs.getAttribute("pubmedids");

String next = "";

String previous = "" ;

System.out.println("ShowDataFromNCBI.jsp");

String loopflag = "yes";

String operation = "";

String recordspage = (String) hs.getAttribute("dbname");

String pmid_url_base = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gp&retmode=xml&id=";

String url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gp&retmode=xml&id=" ;

String pagename = "";

if ( recordspage.equals("genome") )

{

url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=genome&rettype=gp&retmode=xml&id=";

pagename = "ImportGenomeDetails3.jsp";

out.println("db=genome&pagename="+pagename+"<br>");

hs.setAttribute("linkurl",url);

}

//http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=2&rettype=native&retmode=xml

if ( recordspage.equals("taxonomy") )

{

url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&rettype=gp&retmode=xml&id=";

pagename = "ImportTaxonomyDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086&retmode=xml

if ( recordspage.equals("unists") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&retmode=xml&id=";

pagename = "ImportUnistsDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086&retmode=xml

if ( recordspage.equals("structure") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=structure&retmode=xml&id=";

pagename = "ImportStructureDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//-------

if ( recordspage.equals("biosystems") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=biosystems&retmode=xml&id=";

pagename = "ImportBioSystemsDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//books

if ( recordspage.equals("books") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=books&retmode=xml&id=";

pagename = "ImportBooksDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//cancerchromosomes

if ( recordspage.equals("cancerchromosomes") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=cancerchromosomes&retmode=xml&id=";

pagename = "ImportCancerChromosomesDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//cdd

if ( recordspage.equals("cdd ") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=cdd&retmode=xml&id=";

pagename = "ImportCddDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//gap

if ( recordspage.equals("gap") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gap&retmode=xml&id=";

pagename = "ImportCancerChromosomesDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//domains

if ( recordspage.equals("domains") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=domains&retmode=xml&id=";

pagename = "ImportDomainsDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//gene

if ( recordspage.equals("gene") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&retmode=xml&id=";

pagename = "ImportGeneDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//genomeprj

if ( recordspage.equals("genomeprj") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=genomeprj&retmode=xml&id=";

pagename = "ImportGenomeprjDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// gensat

if ( recordspage.equals("gensat") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gensat&retmode=xml&id=";

pagename = "ImportGensatDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//geo

if ( recordspage.equals("geo") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=geo&retmode=xml&id=";

pagename = "ImportGeoDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//gds

if ( recordspage.equals("gds") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gds&retmode=xml&id=";

pagename = "ImportGdsDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//homologene

if ( recordspage.equals("homologene") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=homologene&retmode=xml&id=";

pagename = "ImportHomologeneDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// journals

if ( recordspage.equals("journals") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=journals&retmode=xml&id=";

pagename = "ImportJournalsDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//mesh

if ( recordspage.equals("mesh") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=mesh&retmode=xml&id=";

pagename = "ImportMeshDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//ncbisearch

if ( recordspage.equals("ncbisearch") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=ncbisearch&retmode=xml&id=";

pagename = "ImportNcbisearchDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//nlmcatalog

if ( recordspage.equals("nlmcatalog") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nlmcatalog&retmode=xml&id=";

pagename = "ImportNlmcatalogDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//code starts

// omia

if ( recordspage.equals("omia") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=omia&retmode=xml&id=";

pagename = "ImportOmiaDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//omim

if ( recordspage.equals("omim") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=omim&retmode=xml&id=";

pagename = "ImportOmimDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//pepdome

if ( recordspage.equals("pepdome") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pepdome&retmode=xml&id=";

pagename = "ImportPepdomeDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//pmc

if ( recordspage.equals("pmc") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pmc&retmode=xml&id=";

pagename = "ImportPmcDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// popset

if ( recordspage.equals("popset") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=popset&retmode=xml&id=";

pagename = "ImportPopsetDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// probe

if ( recordspage.equals("probe") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=probe&retmode=xml&id=";

pagename = "ImportprobeDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//proteinclusters

if ( recordspage.equals("proteinclusters") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=proteinclusters&retmode=xml&id=";

pagename = "ImportProteinclustersDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//pcassay

if ( recordspage.equals("pcassay") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pcassay&retmode=xml&id=";

pagename = "ImportPcassayDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//code starts

//pccompound

if ( recordspage.equals("pccompound") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pccompound&retmode=xml&id=";

pagename = "ImportPccompoundDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//pcsubstance

if ( recordspage.equals("pcsubstance") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pcsubstance&retmode=xml&id=";

pagename = "ImportPcsubstanceDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//snp

if ( recordspage.equals("snp") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=snp&retmode=xml&id=";

pagename = "ImportSnpdomeDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//sra

if ( recordspage.equals("sra") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=sra&retmode=xml&id=";

pagename = "ImportSraDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// toolkit

if ( recordspage.equals("toolkit") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=toolkit&retmode=xml&id=";

pagename = "ImportToolkitDetails3.jsp";

hs.setAttribute("linkurl",url);

}

// toolkitall

if ( recordspage.equals("toolkitall") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=toolkitall&retmode=xml&id=";

pagename = "ImportToolkitallDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//unigene

if ( recordspage.equals("unigene") )

{

url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unigene&retmode=xml&id=";

pagename = "ImportUnigeneDetails3.jsp";

hs.setAttribute("linkurl",url);

}

//code ends

//----------

String retstart = hs.getAttribute("retstart").toString();

String retmax = hs.getAttribute("retmax").toString() ;

String retcnt = hs.getAttribute("retcnt").toString() ;

System.out.println("url_string="+ hs.getAttribute("urlstring"));

String urlstring=null;

urlstring = (String)hs.getAttribute("urlstring");

System.out.println("url_sheesecond="+ urlstring);

next = previous = urlstring ;

out.println("retstart="+retstart);

out.println("retmax="+retmax);

out.println("retcnt="+retcnt);

out.println("urstring="+urlstring);

String tempvar = "";

tempvar = urlstring.substring(0, urlstring.indexOf("&retstart")) ;

System.out.println("tempvar="+tempvar);

urlstring= tempvar ;

//String hrefvalue = "txtURLString="+urlstring ;

int maxpage = 0 ;

if ( ( retcnt != null ) && (retmax != null ) )

{

maxpage = Integer.valueOf(retcnt) % Integer.valueOf(retmax) ; //no of pages to be shown

System.out.println("maxpage="+maxpage);

}

if ( Integer.valueOf(retstart) < maxpage ) // if curpage is less than lastpage

{

next = tempvar + "&retstart=" + (Integer.valueOf(retstart)+1) + "&retmax=" + retmax;

//find the value for previous

if ( Integer.valueOf(retstart) > 0 )

{

//previous = hrefvalue + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax;

previous = urlstring + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax;

System.out.println("previous="+previous);

}

else//first page

{

previous = "NA";

}

}

else

{

next = "NA";

//find the value for previous

if ( Integer.valueOf(retstart) > 0 )

{

//previous = hrefvalue + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax;

previous = urlstring + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax;

}

else//first page

{

previous = "NA";

}

}

%>

<html>

<head>

<title>New Page 1</title>

<style type="text/css">

.style1 {

border: 1px solid #3399FF;

background-color: #99CCFF;

font-size: small;

}

.style2 {

border: 1px solid #3399FF;

font-weight: bold;

background-color: #99CCFF;

font-size: small;

}

.style3 {

border-collapse: collapse;

border: 1px solid #3399FF;

background-color: #FFFFFF;

}

.style4 {

border: 1px solid #3399FF;

}

.style5 {

text-align: right;

}

.style7 {

font-size: large;

font-weight: bold;

}

.style8 {

font-size: large;

}

.style9 {

text-align: center;

}

</style>

</head>

<form type=POST action="cancer_des.jsp">

<body>

<p align="center" class="style7">&nbsp;</p>

<div align="center">

<center>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="68%" id="AutoNumber2" height="58">

<tr>

<td width="150%" height="19">

<p align="center"><span class="style8">Search Results Page</span>

<span class="style8">

for <%=(recordspage)%>Database</span></td>

</tr>

<tr>

<td width="150%" height="19" class="style9">

<%=("Select your choice and click your operation button")%>

</td>

</tr>

</table>

</center>

</div>

<p align="center">The following are the search results of your query. Please

click on the link to view the paper.</p>

<div align="center">

<center>

<table cellpadding="0" cellspacing="0" style="padding: 2 4; width: 71%; height: 82px;" id="AutoNumber1" class="style3">

<tr>

<td align="center" style="width: 13%" class="style2">Serial No</td>

<td align="center" style="width: 23%" class="style2">Record ID</td>

<td align="center" style="width: 73%" class="style1"><strong>URL</strong></td>

</tr>

</font><br>

<%

for (int i=0;i < pubmedids.length ; i++ )

{

%>

<tr>

<td bgcolor="#FFFFFF" align="center" style="width: 13%" class="style4"><%=(i+1)%></td>

<td bgcolor="#FFFFFF" align="center" style="width: 23%" class="style4"><%=( pubmedids[i] )%></td>

<td bgcolor="#FFFFFF" align="center" style="width: 73%" class="style4">

<a href="/<%=( pagename + "?loopflag=no&id=" + pubmedids[i] )%>"> View Record</a>

</td>

</tr>

<% }

%>

<tr>

<td bgcolor="#FFFFFF" align="center" style="width: 13%" class="style4">&nbsp;</td>

<td bgcolor="#FFFFFF" align="center" style="width: 23%" class="style4">&nbsp;</td>

<td bgcolor="#FFFFFF" align="center" style="width: 73%" class="style4">

<p align="right">

<%

System.out.println("next="+next);

System.out.println("previous="+previous);

hs.setAttribute("next",next);

hs.setAttribute("previous",previous);

if (next != "NA" )

{

%>

<a href="/<%=( "ImportFromNCBI3.jsp?" + "loopflag=next")%>">Next</a>

<% } %>

<%

if (previous != "NA" )

{

%>

<a href="/<%=( "ImportFromNCBI3.jsp?" + "loopflag=previous")%>">Previous</a>

<% } %>

</td>

</tr>

</table>

</center>

</div>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; <a href="/mainpage.jsp">Back</a></p>

</body>

</form>

</html>

Screenshot of the execution of the above file (assuming it has received the output from the previous step of parsing the XML file).

So now we have displayed the list of records. Now the client will click on a particular record. Thus, this request is sent to the NCBI server in the same manner. We follow the same 4 steps (Step- 1 being the output file of last request. The final output will be a complete set of details for the selected record as shown in this file)

So the first step is the output file having the list of records from the eSearch utility. Now the request goes to the java program that sends it to the NCBI server. WE use the same program but the URL will be different. We use the eFetch utility to get the details in the XML format.

The Java program to send request and receive response. The code should be self-explanatory now after the previous java program explained in the same fashion.

File Name:- ImportGenomeDetails3.jsp

<%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%>

<%@ page language = "java"%>

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import= "javax.xml.parsers.*" %>

<%@ page import= "org.w3c.dom.*" %>

<%

System.out.println("ImportGenomeDetails3.jsp");

javax.servlet.http.HttpSession hs = request.getSession();

try

{

String URLString = "";

Properties systemSettings = System.getProperties();

systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in");

systemSettings.put("http.proxyPort", "80");

systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");

systemSettings.put("sun.net.client.defaultReadTimeout", "10000");

Authenticator.setDefault(new Authenticator()

{

protected PasswordAuthentication getPasswordAuthentication()

{

return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur user name password of iitb login

}

});

System.setProperties(systemSettings);

System.out.println(" System Properties Set");

String retstart = "";

String retmax = "";

URLString = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=genome&rettype=gp&retmode=xml&id=" + request.getParameter("id").toString();

System.out.println(" URL variables set="+URLString);

URL url = new URL(URLString); //url string taken from user input.

HttpURLConnection connection = null;

connection = (HttpURLConnection) url.openConnection();

connection.setRequestMethod("POST");

connection.setDoInput(true);

connection.setDoOutput(true);

connection.setUseCaches(false);

connection.setAllowUserInteraction(false);

connection.setRequestProperty ("Content-Type","text/xml; charset=\"utf-8\"");

System.out.println(" connection Set");

BufferedReader in = new BufferedReader( new InputStreamReader( connection.getInputStream()));

String decodedString;

String tempstr = "";

System.out.println("Reader Set");

Random rn = new Random();

int rnval = rn.nextInt() ;

String fname = "genomefile_ncbi_"+ String.valueOf(rnval) + ".xml" ;

//File file = new File(fname);

BufferedWriter bw = new BufferedWriter(new FileWriter(fname));

while ((decodedString = in.readLine()) != null)

{

tempstr = tempstr + decodedString;

//out.println(tempstr);

bw.write(decodedString);

if ( tempstr.indexOf("/") == -1 )

{

bw.newLine();

}

}

System.out.println(" output given.Set");

bw.close();

in.close();

hs.setAttribute("genomencbifile",fname.toString());

hs.setAttribute("urlstring",URLString);

System.out.println("Session variable Set");

System.out.println( "fname="+ fname.toString() );

System.out.println( "urlstring=" + URLString );

response.sendRedirect("/ProteomDb/FetchGenomeDetailsDataFromNCBI.jsp");

}

catch(Exception ex)

{

out.println("Exception->"+ex);

PrintWriter pw = response.getWriter();

ex.printStackTrace(pw);

}

%>

I hope the code is self-explanatory. It is similar to the step -2 of fetching records using the eSearch utility.

The next step is actually parsing the XML file that is generated. We assume here that you have access to the XML file on your machine.

This file is very complicated to parse and contains nested loops. If you have understood the previous file, then we can easily follow the flow and parse through each tag. You will need to keep the XML file open in another window so that you can understand the file we discuss here.

File -> FetchGenomeDetailsFromNCBI.jsp

//set the language to java and the encoding type to XML/Text

<%@ page language = "java" %>

<%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%>

//import the java libraries.

<%@ page import = "java.util.*"%>

<%@ page import = "java.io.*"%>

<%@ page import="java.lang.*"%>

<%@ page import="java.net.*"%>

<%@ page import="java.nio.*"%>

<%@ page import = "java.sql.*" %>

<% //@ include file = "header.jsp" %>

<%@ page import = "javax.xml.parsers.*" %>

<%@ page import = "org.w3c.dom.*" %>

<%

//get the base URL

String pmid_url_base = "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=";

out.println("FetchGenomeDetailsFromNCBI.jsp");

try

{

//create the session object

javax.servlet.http.HttpSession hs = request.getSession();

//get the file name from the session variable

String fname = (String) hs.getAttribute("genomencbifile");

//print the file name

out.println("fname = " + fname.toString() );

//get the handle of the file name

File file = new File(fname);

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.parse(file);

doc.getDocumentElement().normalize();

//get the value for the parent element GbSet(an element without any child //nodes)

NodeList nodeLst_gbset = doc.getElementsByTagName("GBSet");

Element fstNmElmnt_gbset = (Element) nodeLst_gbset.item(0);

NodeList fstNm_gbset = fstNmElmnt_gbset.getChildNodes();

//out.println("Document GBSet : " + ((Node) fstNm_gbset.item(0)).getNodeValue());

out.println("GBSet node ");

//get the value of the node GBSeq

NodeList nodeLst = doc.getElementsByTagName("GBSeq");

//out.println("<br>Information of all ids.Length="+nodeLst.getLength() + "<br>");

Node fstNode = nodeLst.item(0);

// get the value of the node GbSeq_locus

NodeList nodeLst_gbseqlocus = doc.getElementsByTagName("GBSeq_locus");

Element fstNmElmnt_gbseqlocus = (Element) nodeLst_gbseqlocus.item(0);

NodeList fstNm_gbseqlocus = fstNmElmnt_gbseqlocus.getChildNodes();

out.println("<b> Document GBSeqlocus :</b> " + ((Node) fstNm_gbseqlocus.item(0)).getNodeValue() + "<br>");

// GbSeq_locus.

// get the value of the node GbSeq_length

NodeList nodeLst_gbseqlen = doc.getElementsByTagName("GBSeq_length");

Element fstNmElmnt_gbseqlen = (Element) nodeLst_gbseqlen.item(0);

NodeList fstNm_gbseqlen = fstNmElmnt_gbseqlen.getChildNodes();

out.println("<b>Document GBSeqlen : </b>" + ((Node) fstNm_gbseqlen.item(0)).getNodeValue() + "<br>");

// GbSeq_length code ends..

// get the value of the node GBSeq_definition

NodeList nodeLst_gbseqdefinition = doc.getElementsByTagName("GBSeq_definition");

Element fstNmElmnt_gbseqdefinition = (Element) nodeLst_gbseqdefinition.item(0);

NodeList fstNm_gbseqdefinition = fstNmElmnt_gbseqdefinition.getChildNodes();

out.println("<b> Document GBSeq_definition : </b>" + ((Node) fstNm_gbseqdefinition.item(0)).getNodeValue() + "<br>");

//GBSeq_definition ends

// get the value of the node GBSeq_strandedness

NodeList nodeLst_gbseqstrandedness = doc.getElementsByTagName("GBSeq_strandedness");

Element fstNmElmnt_gbseqstrandedness = (Element) nodeLst_gbseqstrandedness.item(0);

NodeList fstNm_gbseqstrandedness = fstNmElmnt_gbseqstrandedness.getChildNodes();

out.println("<b> Document GBSeq_strandedness : </b>" + ((Node) fstNm_gbseqstrandedness.item(0)).getNodeValue() + "<br>");

//GBSeq_strandedness ends

// get the value of the node GBSeq_moltype

NodeList nodeLst_gbseqmoltype = doc.getElementsByTagName("GBSeq_moltype");

Element fstNmElmnt_gbseqmoltype = (Element) nodeLst_gbseqmoltype.item(0);

NodeList fstNm_gbseqmoltype = fstNmElmnt_gbseqmoltype.getChildNodes();

out.println("<b> Document GBSeq_moltype : </b>" + ((Node) fstNm_gbseqmoltype.item(0)).getNodeValue() + "<br>");

//GBSeq_moltype

// get the value of the node GBSeq_topology

NodeList nodeLst_gbseqtopology = doc.getElementsByTagName("GBSeq_topology");

Element fstNmElmnt_gbseqtopology = (Element) nodeLst_gbseqtopology.item(0);

NodeList fstNm_gbseqtopology = fstNmElmnt_gbseqtopology.getChildNodes();

out.println("<b> Document GBSeq_topology : </b>" + ((Node) fstNm_gbseqtopology.item(0)).getNodeValue() + "<br>");

//GBSeq_topology code ends

// get the value of the node GBSeq_division

NodeList nodeLst_gbseqdivision = doc.getElementsByTagName("GBSeq_division");

Element fstNmElmnt_gbseqdivision = (Element) nodeLst_gbseqdivision.item(0);

NodeList fstNm_gbseqdivision = fstNmElmnt_gbseqdivision.getChildNodes();

out.println("<b> Document GBSeq_division : </b>" + ((Node) fstNm_gbseqdivision.item(0)).getNodeValue() + "<br>");

//GBSeq_division code ends

// get the value of the node GBSeq_update-date

NodeList nodeLst_gbsequpdatedate = doc.getElementsByTagName("GBSeq_update-date");

Element fstNmElmnt_gbsequpdatedate = (Element) nodeLst_gbsequpdatedate.item(0);

NodeList fstNm_gbsequpdatedate = fstNmElmnt_gbsequpdatedate.getChildNodes();

out.println("<b> Document GBSeq_updatedate : </b>" + ((Node) fstNm_gbsequpdatedate.item(0)).getNodeValue() + "<br>");

//GBSeq_update-date code ends

// get the value of the node GBSeq_create-date

NodeList nodeLst_gbseqcreatedate = doc.getElementsByTagName("GBSeq_create-date");

Element fstNmElmnt_gbseqcreatedate = (Element) nodeLst_gbseqcreatedate.item(0);

NodeList fstNm_gbseqcreatedate = fstNmElmnt_gbseqcreatedate.getChildNodes();

out.println("<b> Document GBSeq_createdate : </b>" + ((Node) fstNm_gbseqcreatedate.item(0)).getNodeValue() + "<br>");

//GBSeq_create-date code ends

// get the value of the node GBSeq_primary-accession

NodeList nodeLst_gbseqpriacc = doc.getElementsByTagName("GBSeq_primary-accession");

Element fstNmElmnt_gbseqpriacc = (Element) nodeLst_gbseqpriacc.item(0);

NodeList fstNm_gbseqpriacc = fstNmElmnt_gbseqpriacc.getChildNodes();

out.println("<b>Document GBSeq_primary-accession : </b>" + ((Node) fstNm_gbseqpriacc.item(0)).getNodeValue() + "<br>");

//GBSeq_primary-accession ends..

// get the value of the node GBSeq_accession-version

NodeList nodeLst_gbseqpriaccver = doc.getElementsByTagName("GBSeq_primary-accession");

Element fstNmElmnt_gbseqpriaccver = (Element) nodeLst_gbseqpriaccver.item(0);

NodeList fstNm_gbseqpriaccver = fstNmElmnt_gbseqpriaccver.getChildNodes();

out.println("<b> Document GBSeq_primary-accession version: </b>" + ((Node) fstNm_gbseqpriaccver.item(0)).getNodeValue() + "<br>");

//GBSeq_accession-version ends..

// get the value of the node GBSeq_source

NodeList nodeLst_gbseqsource = doc.getElementsByTagName("GBSeq_source");

Element fstNmElmnt_gbseqsource = (Element) nodeLst_gbseqsource.item(0);

NodeList fstNm_gbseqsource = fstNmElmnt_gbseqsource.getChildNodes();

out.println("<b> Document GBSeq_source:</b> " + ((Node) fstNm_gbseqsource.item(0)).getNodeValue() + "<br>");

//GBSeq_source ends.

// get the value of the node GBSeq_organism

NodeList nodeLst_gbseqorg = doc.getElementsByTagName("GBSeq_organism");

Element fstNmElmnt_gbseqorg = (Element) nodeLst_gbseqorg.item(0);

NodeList fstNm_gbseqorg = fstNmElmnt_gbseqorg.getChildNodes();

out.println("<b> Document GBSeq_organism: </b>" + ((Node) fstNm_gbseqorg.item(0)).getNodeValue() + "<br>");

//GBSeq_organism ends

// get the value of the node GBSeq_taxonomy

NodeList nodeLst_gbseqtax = doc.getElementsByTagName("GBSeq_taxonomy");

Element fstNmElmnt_gbseqtax = (Element) nodeLst_gbseqtax.item(0);

NodeList fstNm_gbseqtax = fstNmElmnt_gbseqtax.getChildNodes();

out.println("<b> Document GBSeq_taxonomy: </b>" + ((Node) fstNm_gbseqtax.item(0)).getNodeValue() + "<br>");

//GBSeq_taxonomy

// get the value of the node GBSeq_references

NodeList nodeLst_GbSeqRefs = doc.getElementsByTagName("GBSeq_references");

//out.println("<b> Node GBSeq_references ..Length=</b>"+nodeLst_GbSeqRefs.getLength() + "<br>");

//out.println("Node GBSeq_references <br>");

//loop through all the records of this tag - GBSeq_references.

for (int s = 0; s < nodeLst_GbSeqRefs.getLength(); s++)

{

//get the value for the node item at position s.

Node fstNode_GbSeqRefNode = nodeLst_GbSeqRefs.item(s);

// out.println(" fst_Node_GbSeqRefNode first for loop<br>");

if (fstNode_GbSeqRefNode.getNodeType() == Node.ELEMENT_NODE) //if condition for idlist nodes

{

//out.println(" second if condition <br>");

//out.println("fstNode_GbSeqRefNode in first if condition<br>");

//Element fstElmnt_gbref = (Element) fstNode;

//get the node for GBReference.

NodeList fstNmElmntLst_gbref = doc.getElementsByTagName("GBReference");

//String[] pubmedids = new String[fstNmElmntLst.getLength()];

//loop through all the Gbreference

for (int h = 0; h < fstNmElmntLst_gbref.getLength(); h++) //gets all the gbreference tag nodes

{

//get the node of the item at position s

Node fstNode_GbSeqRefNode2 = fstNmElmntLst_gbref.item(s);

if (fstNode_GbSeqRefNode2.getNodeType() == Node.ELEMENT_NODE)

//if condition -gets all the gbreference tag nodes

{

/ out.println(" fst_Node_GbSeqRefNode2 second for loop <br>");

// out.println(" fstNmElmntLst_gbref loop <br> ");

Element fstNmElmnt_gbref_ref = (Element) fstNode_GbSeqRefNode2;

//get node GBReference_reference

NodeList nodeLst_gbref_ref = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_reference");

Element nodeLst_gbref_refElement = (Element)nodeLst_gbref_ref.item(0);

NodeList textnodeLst_gbref_ref = nodeLst_gbref_refElement.getChildNodes();

out.println("<b> Document GBReference_reference: </b>" + ((Node) textnodeLst_gbref_ref.item(0)).getNodeValue() + "<br>");

//code ends for GBReference_reference

//get node GBReference_position

NodeList nodeLst_gbref_pos = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_position");

Element nodeLst_gbref_posElement = (Element)nodeLst_gbref_pos.item(0);

NodeList textnodeLst_gbref_pos = nodeLst_gbref_posElement.getChildNodes();

out.println("<b> Document GBReference_position: </b>" + ((Node) textnodeLst_gbref_pos.item(0)).getNodeValue() + "<br>");

//code ends for GBReference_reference

//gbauth loop for authors fields starts

//Element fstElmnt_gbref_auth = (Element) fstNodeauth;

NodeList fstNmElmntLst_gbauth = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_authors");

for (int k = 0; k < fstNmElmntLst_gbauth.getLength(); k++) //gets all the //gbreference tag nodes

{

Node fstNode_GbSeqRefNode3 = fstNmElmntLst_gbauth.item(k);

if (fstNode_GbSeqRefNode3.getNodeType() == Node.ELEMENT_NODE)

{

//out.println(" fst_Node_GbSeqRefNode3 authors second for loop <br>");

//out.println(" fstNmElmntLst_gbauth loop <br> ");

Element fstNmElmnt_gbref_auth = (Element) fstNode_GbSeqRefNode3;

//get node GBAuthor

NodeList nodeLst_gbref_gbauth = fstNmElmnt_gbref_auth.getElementsByTagName("GBAuthor");

//loop for all records.

//out.println("<br><b> No of gbAuthors =</b>" + nodeLst_gbref_gbauth.getLength() + "<br>");

for (int l = 0; l < nodeLst_gbref_gbauth.getLength(); l++) //gets all the / /gbreference tag nodes

{

out.println("<b> Author No=" + l + "=</b>" + nodeLst_gbref_gbauth.item(l).getChildNodes().item(0).getNodeValue() + "<br>");

}//code ends

}

}//gbauth loop ends

//get node GBReference_title

NodeList nodeLst_gbref_title = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_title");

Element nodeLst_gbref_titleElement = (Element)nodeLst_gbref_title.item(0);

NodeList textnodeLst_gbref_title = nodeLst_gbref_titleElement.getChildNodes();

out.println("<b> Document GBReference_title: </b>" + ((Node) textnodeLst_gbref_title.item(0)).getNodeValue() + "<br>");

//code ends for GBReference_reference

//get node GBReference_journal

NodeList nodeLst_gbref_journal = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_journal");

Element nodeLst_gbref_journalElement = (Element)nodeLst_gbref_journal.item(0);

NodeList textnodeLst_gbref_journal = nodeLst_gbref_journalElement.getChildNodes();

out.println("<b> Document GBReference_journal: </b>" + ((Node) textnodeLst_gbref_journal.item(0)).getNodeValue() + "<br>");

//code ends for GBReference_journal

//out.println("Features - Location/Qualifiers");

//features code starts GBSeq_feature-table

//Element fstElmnt_gbref = (Element) fstNode;

NodeList fstNmElmntLst_gbfeattab = doc.getElementsByTagName("GBSeq_feature-table");

for (int z = 0; z < fstNmElmntLst_gbfeattab.getLength(); z++) //gets all the GBSeq_feature-table tag nodes

{

Node fstNode_GbSeqRefNode7 = fstNmElmntLst_gbfeattab.item(z);

if (fstNode_GbSeqRefNode7.getNodeType() == Node.ELEMENT_NODE) //if condition for GBSeq_feature-table

{

//out.println(" fst_Node_GbSeqRefNode7 features table second for loop <br>");

//out.println(" fstNmElmntLst_gbfeattab loop <br> ");

Element fstNmElmnt_gbfeat = (Element) fstNode_GbSeqRefNode7;

//get node GB feat tab

NodeList nodeLst_gbfeat = fstNmElmnt_gbfeat.getElementsByTagName("GBFeature");

//loop for all records.

//out.println("<br> <b> No of GBFeature =</b>" + nodeLst_gbfeat.getLength() + "<br>");

for (int y = 0; y < nodeLst_gbfeat.getLength(); y++) //gets all the gbfeature tag nodes

{

Node fstNode_GbSeqRefNode8 = nodeLst_gbfeat.item(y);

out.println("<b> GBFeature No=" + y + "=</b>" + nodeLst_gbfeat.item(y).getChildNodes().item(0).getNodeValue() + "<br>");

if (fstNode_GbSeqRefNode8.getNodeType() == Node.ELEMENT_NODE)

{

//out.println(" fst_Node_GbSeqRefNode8 features table second for loop <br>");

Element fstNmElmnt_gbfeat2 = (Element) fstNode_GbSeqRefNode8;

NodeList textnodeLst_gbfeatkey = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_key");

out.println("<b>Document GBFeature_key: </b>" + ((Node) textnodeLst_gbfeatkey.item(0)).getChildNodes().item(0).getNodeValue() + "<br>");

out.println("<br>"); //GBFeature_location

NodeList textnodeLst_gbfeatloc = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_location");

out.println("<b>Document GBFeature_location: </b>" + ((Node) textnodeLst_gbfeatloc.item(0)).getChildNodes().item(0).getNodeValue() + "<br>");

//code ends

out.println("<br>");

//GBFeature_intervals

NodeList textnodeLst_gbfeatint = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_intervals");

//gets all the gbreference tag nodes

for (int x = 0; x < textnodeLst_gbfeatint.getLength(); x++) {

Node fstNode_GbSeqRefNode9 = textnodeLst_gbfeatint.item(x);

if (fstNode_GbSeqRefNode9.getNodeType() == Node.ELEMENT_NODE)

{

Element fstNmElmnt_gbfeatint2 = (Element) fstNode_GbSeqRefNode9;

NodeList textnodeLst_gbintfrom = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_from");

out.println("Document GBFeature_Interval From: " + ((Node) textnodeLst_gbintfrom.item(0)).getChildNodes().item(0).getNodeValue() + "<br>");

//GBInterval_to

NodeList textnodeLst_gbintto = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_to");

out.println("Document GBFeature_Interval To: " + ((Node) textnodeLst_gbintto.item(0)).getChildNodes().item(0).getNodeValue() + "<br>");

//GBInterval_accession

//GBInterval_to

NodeList textnodeLst_gbintAcc = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_accession");

out.println("Document GBFeature_ GBInterval_accession: " + ((Node) textnodeLst_gbintAcc.item(0)).getChildNodes().item(0).getNodeValue() + "<br>");

}

}

out.println("<br>");

//GBFeature_intervals code ends

//GBFeature_quals

NodeList textnodeLst_gbfeatquals = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_quals");

//out.println("Length="+textnodeLst_gbfeatquals.getLength() + "<br>");

for ( int x = 0; x < textnodeLst_gbfeatquals.getLength(); x++) //gets all the gbreference tag nodes

{

Node fstNode_GbSeqRefNode10 = textnodeLst_gbfeatquals.item(x);

if (fstNode_GbSeqRefNode10.getNodeType() ==

Node.ELEMENT_NODE)

{

Element fstNmElmnt_gbfeatint2 = (Element)

fstNode_GbSeqRefNode10;

//GBQualifier_name NodeList textnodeLst_gbintfrom =

fstNmElmnt_gbfeatint2.getElementsByTagName("GBQualifier_name");

out.println("GBQualifier_name Length="+textnodeLst_gbintfrom.getLength() + "<br>");

//GBQualifier_value

NodeList textnodeLst_gbqualval = fstNmElmnt_gbfeatint2.getElementsByTagName("GBQualifier_value");

//out.println("GBQualifier_name Length="+textnodeLst_gbqualval.getLength() + "<br>");

out.println("<br>");

//gets all the gbreference tag nodes

for ( int d = 0; d < textnodeLst_gbintfrom.getLength(); d++) { out.println("Document GBSeqRefNode10: " + ((Node)

textnodeLst_gbintfrom.item(x)).getChildNodes().item(d).getNodeValue() + "<br>");

out.println("<b>GBQualifier_name No= " + d + "=</b>" + textnodeLst_gbintfrom.item(d).getChildNodes().item(0).getNodeValue() + "<br>");

out.println("<b>GBQualifier_value No= " + d + "=</b>" + textnodeLst_gbqualval.item(d).getChildNodes().item(0).getNodeValue() + "<br>");

}

}

}

//GBFeature_quals code ends

}

}//gbfeature tag nodes code ends

}//if condition for GBSeq_feature-table ends here..

}//GBSeq_feature-table code ends here@ for loop

}////if condition -gets all the gbreference tag nodes ends

}////condition -gets all the gbreference tag nodes for ends

//NodeList fstNmElmntLst_gbsequence = doc.getElementsByTagName("GBSeq_sequence");

NodeList nodeLst_gbsequence = doc.getElementsByTagName("GBSeq_sequence");

Element fstNmElmnt_gbsequence = (Element) nodeLst_gbsequence.item(0);

NodeList fstNm_gbsequence = fstNmElmnt_gbsequence.getChildNodes();

out.println("<b> Document GBsequence :</b> " + ((Node) fstNm_gbsequence.item(0)).getNodeValue() + "<br>");

out.println("<br>");

//GBSeq_sequence code goes here.

}////if condition for idlist nodes

}//for ends@idlist nodes .

//GBSeq_references ends

}

catch (Exception e)

{

e.printStackTrace();

}

finally

{}

%>

You are here: Tutorials Languages Java