NeighborSearch: A Web Tool for Exploring Molecular Neighborhoods

TJ O'Donnell
Tom Doman
Mike Cibulskis

G.D. Searle, Skokie, IL

Presented at Daylight MUG'97, Laguna Beach, CA

Index to this document

What is NeighborSearch?
Which databases are available?
Why NeighborSearch?
How does NeighborSearch work?
Who uses NeighborSearch?
What do users like about NeighborSearch?
Is NeighborSearch available on the Internet?

What is NeighborSearch?

NeighborSearch is an intranet web tool intended for medicinal and other chemists who have limited experience with computers and little or no familiarity with structural searching techniques. Of course, users with more experience should find it interesting, too. It's purpose is to locate compounds in several databases which are similar to one or more input molecular structures.

Structures are input either by name (usually corporate database name/number) or by SMILES. There is an interface to Daylight's web tool GRINS for the graphical input of SMILES. Alternately, users can lasso a SMILES string from ChemDraw and paste it into NeighborSearch. Of course, one could enter the SMILES by hand, as well.

There are several databases available. Each database contains the SMILES and the Daylight fingerprint for its compounds. When a user inputs a structure, its fingerprint is computed. Each of the structures in the selected database(s) is examined for similarity to the input structure(s) using the Tanimoto index. Typically, the 20 most similar structures, those with a similarity index greater than 0.70 are reported. We call these the neighborhood of the input structure(s).

Which databases are available?

For a database of compounds to be made available for NeighborSearch to search, we need to know (or compute) the SMILES and fingerprint of each compound. At Searle, we have done this for the Searle corporate database (SC file), the Monsanto corporate database (CP file), MDL's ACD (Available Chemicals Directory) database, and the CAP (Chemical Acquisitions Program) database of Monsanto. The database format used by NeighborSearch is a gdbm (GNU's dbm) file. Gdbm is freely available. We have chosen to use gdbm because of its easy integration with perl and because of other tools at Searle (CLUE, MUG'96) which use these gdbm databases. We are considering replacing the gdbm databases with Thor and Merlin. Using Merlin would facilitate the addition of sub-structure searching to NeighborSearch.

Why NeighborSearch?

There are many tools available to locate interesting, similar compounds. These tools can be quite sophisticated and powerful. Unfortunately, they can be quite intimidating for chemists unfamiliar with techniques of sub-structure specification and searching. Many chemists are so unfamiliar with the variety (and quirks) of graphical user interfaces that they use them infrequently and with a lot of frustration. We decided to make something of a black-box to help chemists answer simple questions they ask everyday; questions like:

Do we have any compounds like the one I want to make or the one I just read about in J. Med. Chem.?
Are there any compounds commercially available which are like this intermediate?

To make it easier to use, we decided on using a familiar web-browser type of interface. We also assumed what is meant by "like" so that the chemist need not specify details of sub-structure searching or even the parameters of the similarity search. It is possible to tailor some of the parameters of the similarity search, but reasonable defaults are pre-selected. No sub-structure searching capabilities are available in NeighborSearch. We are discussing how we might incorporate this feature in an easy-to-use, intuitive way.

How does NeighborSearch work?

NeighborSearch is a cgi (common gateway interface) program written in perl. It is started by opening a URL in your favorite web-browser. We've tested it with Netscape versions 2 and 3 and Microsoft Internet Explorer version 3. You don't want to use Netscape version 1. The URL connects to a local (intranet) web server machine maintained by our group at Searle. It asks for user authorization (username and password). It then presents a form allowing the user to select one or more databases to search and the name or SMILES of one or more input structures. As input structures are accepted and processed by the NeighborSearch perl script, a gif depiction (using Daylight's smitogif) is returned on the next page along with several interesting properties (molecular weight, formula, clogp, cmr, number of rings, etc.) computed using a DayPerl script written at Searle.

Once the structure(s) is input, the user presses the "Generate neighbors" button which causes NeighborSearch to run a background job (neighbor_of) which computes the Tanimoto index and returns the 20 nearest neighbors. This computation takes about 15-90 seconds depending on the total number of compounds contained in the selected database(s). The neighbors are depicted in the next page of the web-browser in a format reminiscent of prado. The user may print this page using the web-browser's print function, but the results are poor.

We have added our own print button which writes a SMI file of the neighbors and has prado create a postscript file. The postscript file is returned to the user, who can do with it as she pleases. Typically this file is processed by a helper application, such as lp (UNIX), DropPS (MAC), xpsview or ghostview. The list of neighbors can also be returned to the user as a SMI file or an MDL hit-list file.

Who uses NeighborSearch?

Of a projected chemist user base of 100, 26 have logged into NeighborSearch at least once. Of 9 modellers, 4 have logged in at least once (44%). In addition, 3 "computer jocks" have tried NeighborSearch out of curiousity about cgi programs. NeighborSearch is especially popular in the screening group, and among senior chemists who are uncomfortable with complex substructure searching tasks.

What do users like about NeighborSearch?

Users of NeighborSearch identify a number of attractive features. As with most browser-based applications, NeighborSearch is quite easy to use. The interface is simple, and queries are quickly constructed. Multiple chemical inventories may be searched simultaneously. A NeighborSearch neighbor search can be configured to produce hits even in cases when substructure searching yields nothing. These neighbors are often quite different than substructure hits, yielding new ideas. Finally, there is no need to enter a complex substructure.

Is NeighborSearch available on the Internet?

We cannot allow the use of NeighborSearch outside the Searle intranet. However, we will release the perl script, the code for the neighbor_of computation and other associated utility programs to allow you to recreate the NeighborSearch environment on your web-server using your databases. Watch the Home Shopping Channel for details, or contact one of the authors listed above.