Java-for-beginners - Fetch Data from NCBI

E-mail Print PDF
Article Index
Java-for-beginners
first code
code 1 explanation
Executing the Program
Fetch Data from NCBI
NCBI fetch reference
All Pages

Fetch Data from NCBI

I am assuming that you are aware of the terms URL (Uniform Resource)

The following are the four steps in which data is fetched from NCBI

1) Take input from user.

Fetching data from NCBI uses the Entrez programming utility.

In this step, we present an HTML page to the user which will take the necessary information from the user. The parameters of the request URL (in this case NCBI Entrez URL) that is to be used is set as in the input field. For e.g., if one of the parameters to be added is the search words, then we will add a field in the input form called “Search Term”. This field will take in the search terms that the user enters.

A snapshot of the form is shown below.

In this step, we first create an interface (or a user page) to allow the user to enter the input parameters for the page. For e.g. in the functionality to enter the NCBI details, we need to ask the user to enter the search terms (if you have seen the URL values in the Entrez Programing Utility, you will find that there is a field called “term”). Thus, we will need to create a URL in Entrez format based on the input parameters provided.

On looking at the Entrez utility to fetch a list of records for a particular search term, we use the eSearch method. This eSearch method is the base URL that you will use to fetch a list of records from NCBI database. The URL is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? Please check the Entrez utility for further details. Now, the next task is to find the set of parameters that are to be sent along with this URL.

In the input page, there are two field defined.

1) Database Name: - This is to tell the NCBI server which database is to be searched.

2) Search Term: - This is to tell the NCBI server the search terms for which the records are to be returned.

Look up the Entrez utility page on NCBI. There in the eSearch page, you will find the key parameters. For the database name, the keyword will be “db” and for search term, the keyword is “term”.

Again, we need to tell the server how many records are to be returned. By default, NCBI will return a maximum of 50 records. Also, you have to tell NCBI which page is to be returned. Say for e.g. there are 12000 records found for your search term. NCBI returns 50 records per page, and then it will return 12000 / 50 = 240 pages. Thus, you will need to define which page is to be returned right?

The values for this 2 parameters are called “retmax” and “retstart”. These parameters are pre-decided by us and the client does not control these values. Here, we have kept the value of “retmax” as 50 and “retstart” as the page no. for the search page.

OK, now you have all the parameters along with the base URL. Thus, you need to send this URL to the second program (that will actually send the request to NCBI and get the output).

In order to make this page, you just need the simple HTML editor and some basic HTML/JavaScript programming. I have attached a simple code file for your reference. You might want to look at www.w3schools.com for further details on learning them.

In the form presented above, the user enters the keyword values and then presses the “Submit” button. This sends the details of the page to the second program (explained in Step-2)

2) Send request and response from NCBI

Now we have received the user input. Assume that the user has entered the keyword “cancer” search term and the database is the “genome”. Now we need to present the NCBI server with the input in the specific Entrez format and then send the request.

OK, the URL that will be created is something like:-

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=cancer

We need to pass this URL to the Java program here which will send it to the NCBI server. Please note that we are using the same URL pattern to get the different pages from the resulting output page. For e.g. once the first 50 records are shown to the user (Step-4), the user may want to browse the next or previous pages too. For this, we need to send a request to this same Java program. Again, how do we know if the request is from the client page or from the output page. For this, we have used the variable “loopflag”. For requests from client page, loopflag = no.

Snapshots of the program.

1) First include the libraries.

2) Start the try catch block and

Declare the session variable and start the try-catch block.

3) Define the System properties for connecting to the internet from the home network.

4) Now check if this file has been called from the client or from the output file.

We have placed a flag and check if its value. If it is from the client side, then it’s value will be no. Else, it will be next or previous as per the value sent from the output file.

5) Set the connection to the URL of the NCBI server for the database. Note that the methods have been set for the connection. The connection object will connect to the NCBI URL and is set to accept input, send output with no cache storage and no user interaction. This connection will also accept text/xml output.

6) Declare the variables. Create an object of BufferedReader to read the input from the NCBI server.

7) Now we use the Random class object to create the file name. We create the BufferedWriter class object to write the contents to a file. Next, we create a

while loop to actually read each content line by line and write the content to the XML file present in the tomcat home directory.

8) Now we try to close the loop and the objects that we created. Next, we store the file name and the variables to the session. Next, we print the values and then redirect the page to parse the XML file.

Check the xml file that was generated to view the output.

Hope this explains the code. The while file is mentioned below

 



You are here: Tutorials Languages Java