Chapter 12 Working With Search Collections
The server includes a search feature that allows users to search documents
on the server and display results on a web page. Server administrators create
the indexes of documents against which users will search (called collections), and can customize the search interface
to meet the needs of their users.
For more information on querying the search collections, refer to the search online help.
About Search
The search feature is installed with other web components during the
installation of Sun Java System Web Server. Search is configured and managed
at the virtual server level instead of the server instance level
From the Search tab under the Virtual Servers tab in administration
console, you can:
-
Enable and disable the search feature
-
Create, modify, delete, and re-index search collections
-
Create, modify, and remove scheduled maintenance tasks for
search collections
Information obtained from the administrative interface is stored in
the <server-root>/config/server.xml file, where it is mapped within the VS element.
Server administrators can customize the search query and search results
pages. This might include re-branding the pages with a corporate logo, or
changing the way search results appear. In previous releases this was accomplished
through the use of pattern files.
There is no global “on” or “off” functionality
for search. Instead, a default search web application is provided and then
enabled or disabled on a specific virtual server. This search application
provides the basic web pages used to query collections and view results. The
search application includes sample JSPs that demonstrate how to use the search
tag libraries to build customized search interfaces.
Note –
Sun Java System Web Server does not provide access checking on
search results. Due to the number of potential security models and realms,
it is impossible to perform security checks and filter results from within
the search application. It is the responsibility of the server administrator
to ensure that appropriate security mechanisms are in place to protect content.
Configuring Search Properties
Search is enabled for a virtual server by enabling the search application
included the server.
Note –
The Java web container must be enabled for search to be enabled.
After ensuring that Java is enabled for the virtual server you want
to configure, enable search by performing the following steps:
-
Click Configurations tab.
-
Select the configuration from the configuration list.
-
Click Virtual Servers tab.
-
Select the virtual server from the virtual server list.
-
Click Search tab.
-
Under the Search Application section,
click Enabled checkbox to enable the search
application.
Other parameters, which you can configure are listed below:
-
URI. If you plan to use a custom
search application, enter the URI; if you are using the default search application,
you do not need to specify a value here.
-
Max Hits. Specify the maximum
results retrieved in a search query.
-
Enabled. Check this to
enable the default search application.
Note –
Using CLI
For setting
search properties through CLI, perform the following command in CLI:
wadm> set-search-prop --user=admin --password-file=admin.pwd --host=serverhost
--port=8888 --no-ssl --rcfile=null --config=config1 --vs=config1_vs_1
enabled=true max-hits=1200
|
See CLI Reference, set-search-prop(1).
Configuring Search Collections
Searches require a database of searchable data against which users will
search. Server administrators create this database, called a collection, which
indexes and stores information about documents on the server. Once the server
administrator indexes all or some of a server’s documents, information
such as title, creation date, and author is available for searching.
Note –
About Search Collections:
-
Collections are specific to the virtual server being administered
-
Only documents visible from the virtual server are presented
in the administrative interface and available to be indexed
-
There is no limit to the number of collections that can exist
on your server
-
Documents in a search collection are not specific to any one
character encoding, which means that a search collection can be associated
with multiple encoding.
Supported Formats
Files of the following format can be indexed and searched.
-
HTML documents — .html and .htm
-
ASCII Plain Text — .txt
-
PDF
Adding a Search Collection
To add a new collection, perform the following tasks:
-
Click Configurations tab.
-
Select the configuration from the configuration list.
-
Click Virtual Servers tab.
-
Select the virtual server from the virtual server list.
-
Click Search tab.
-
Under the Search Collections section,
click Add Search Collection button to
add a new search collection.
The following section describes the fields in the page for creating
a new search collection:
-
Provide Search Collection Information
-
Collection Name —
Enter a unique name for the search collection.
Note –
Multi byte characters are not allowed as collection name.
-
Display Name — (Optional)
This will appear as the collection name in the search query page. If you do
not specify a display name, the collection name serves as the display name.
-
Description — (Optional)
Enter text that describes the new collection.
-
Path — You can either
create the collection in the default location or provide a valid path, where
the collection will be stored.
-
Provide Indexing Information
-
Directory to Index —
Enter the directory from which documents will be indexed into the collection.
Only the directories visible from this virtual server can be indexed.
-
Sub Directory— Enter
the sub directory from which documents will be indexed into the collection.
Sub directory path should be relative to the directory path specified earlier.
-
Pattern — Specify
a wildcard to select the files to be indexed.
Use the wildcard
pattern judiciously to ensure that only specific files are indexed. For example,
specifying *.* might cause even executable and Perl scripts to be indexed.
-
Subdirectories— Enabled/Disabled.
If you select this option, documents within the subdirectories of the selected
directory will also be indexed. This is the default action.
-
Default Document Encoding —
Documents in a collections are not restricted to a single language/encoding.
Every time documents are added, only a single encoding can be specified; however,
the next time you add documents to the collection, you can select a different
default encoding.
-
Step 3: View the Summary
-
View the summary and click Finish button
to add the new collection.
Note –
Using CLI
For adding
a search collection through CLI, execute the following command.
wadm> create-search-collection --user=admin --password-file=admin.pwd
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 --uri=/search_config1
--document-root=../docs searchcoll
|
See CLI Reference, create-search-collection(1).
Deleting a Search Collection
For deleting a search collection, perform the following tasks:
-
Click Configurations tab.
-
Select the configuration from the configuration list.
-
Click Virtual Servers tab.
-
Select the virtual server from the virtual server list.
-
Click Search tab.
-
Under the Search Collections section,
select the collection name and click Delete button to
delete the collection.
Note –
Using CLI
For deleting
a search collection through CLI, execute the following command.
wadm> delete-search-collection --user=admin --password-file=admin.pwd
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 searchcoll
|
See CLI Reference, delete-search-collection(1).
Scheduling Collection Update
You can schedule maintenance tasks to be performed on collections at
regular intervals. The tasks that can be scheduled are re-indexing and updating.
The administrative interface is used to schedule the tasks for a specific
collection. You can specify the:
-
Task to perform (re-indexing or updating)
-
Time of day to perform the task
-
Day(s) of the week to perform the task
To schedule events for the collection, perform the following tasks:
-
Click Configurations tab.
-
Select the configuration from the configuration list.
-
Click Virtual Servers tab.
-
Select the virtual server from the virtual server list.
-
Click Search tab.
-
Click Scheduled Events tab.
-
Under Search Events tab,
click New button.
The following table describes the fields in the New Search Event Schedule
page:
Table 12–1 Field Description > New Search Event
Schedule
|
Field
|
Description
|
|
Collection
|
Select the collection from the drop down list for which you need to
schedule maintenance.
|
|
Event
|
-
Re-index Collection—This scheduled event will re-index
the specified collection at the specified time.
-
Update Collection—You can add or remove files after
a collection has been created. Documents can be added only from under the
directory that was specified during collection creation. If you are removing
documents, only the entries for the files and their metadata are removed from
the collection. The actual files themselves are not removed from the file
system. This scheduled event will update the collection at the specified time.
-
Pattern—Specify a wildcard to select the files to be
indexed.
-
Subdirectories included—If you select this option, documents
within the subdirectories of the selected directory will also be indexed.
This is the default action.
-
Encoding—Specify the character encoding for the documents
to be indexed. The default is ISO-8859-1. The indexing
engine tries to determine the encoding of HTML documents from the embedded
meta tag. If this is not specified, the default encoding is used.
|
|
Time
|
The configured time when the event will start. Select the hour and minutes
value from the drop down box.
Every Day — Starts the event
specified every day at the specified time.
Specific Days — Starts the
event specified at specific days.
1. Days — Specify any day from Sunday to
Saturday.
2. Dates — Specify any day of the month from
1 to 31 as comma separated entries. E.g. 4,23,9
Specific Months — Starts the
event specified at the specific time and month. Specify month from January
to December.
|
|
Interval
|
Start the specified event after this time period.
1. Every Hours — Select the number of hours
from the drop down box.
2. Every Seconds — Select the number of seconds
from the drop down box.
|
|
|
|
Performing a Search
Users are primarily concerned with asking questions of the data in the
search collections, and getting a list of documents in return. The search
web application installed with Sun Java System Web Server provides default
search query and search results pages. These pages can be used as they are,
or customized using a set of JSP tags as described in Customizing
Search Pages.
Users search against collections that have been created by the server
administrator. They can:
-
Input a set of keywords and optional query operators on which
to search
-
Search only collections that are visible to the virtual server
-
Search against a single collection, or across a set of collections
visible to the virtual server
Server administrators must provide users with the URL needed to access
the search query page for a virtual server.
The Search Page
The default URL end-users can use to access search functionality is:
http://<server-instance>:port number/search
Example:
http://plaza:8080/search
When the end-user invokes this URL, the Search page, which is a Java
web application, is launched.
Note –
For more detailed information about conducting basic and advanced
searches, including information about keywords and optional query operators,
see the online Help provided with the search engine. To access this information,
click the Help link on the Search page.
Making a Query
A search query page is used to search against a collection. Users input
a set of keywords and optional query operators, and then receive results on
a web page displayed in their browser. The results page contains links to
documents on the server that match the search criteria.
Note –
Server administrators can customize this search query page, as
described in “Customizing Search Pages.”
To make a query, perform the following steps:
Making a Query
-
Access the Search web application by entering its URL in the Location
bar of your browser, in the following format:
http://<server-instance>:port number/search
-
In the search query page that appears, check the checkbox representing
the collection you want to search in the "Search in" field.
-
Type in a few words that describe your query and hit the ’enter’
key (or click on the Search button) for a list of relevant web pages.
For a more fine-tuned search, you can use the search parameters provided
in the Advanced Search page, described in the following section.
Advanced Search
Users can increase the accuracy of their searches by adding operators
that fine-tune their keywords. These options can be selected from the Advanced
Search page.
To make an advanced search query, perform the following steps:
To Make an Advanced Search Query
-
Access the Search web application by entering its URL in the Location
bar of your browser, in the following format:
http://<server-instance>:port number/search
-
Click the Advanced link.
-
Enter any or all of the following information:
Document Field
Sun Java? System Web Server maintains an index of documents. The index
contains an entry for each document. Each index entry contains one or more
fields such as Title, Author, and URL. Queries can be limited to specific
document fields, and documents are only found if they match your criteria
in the specified fields.
For example, if you simply search for Einstein, you will find all documents
that have the word Einstein in any one of the Title, Author, or Keywords fields.
This will include documents about Einstein, documents that make reference
to Einstein, and documents written by Einstein. But if you specify Author
= "Albert Einstein" , you will only find documents written by Albert
Einstein.
By default, the index fields that you can search are:
-
Author — The author,
authors, or organization that created the document as specified with an <author> meta tag.
-
Keywords — The keywords
as specified with a <keywords> meta tag.
-
Date — The date that
this document was last edited or modified.
-
Title — The document's
title as specified with the HTML <title> tag.
Search Query Operators
For a detailed list of search query operators, refer to the Administration
Console Search Online Help.
Viewing Search Results
Search results are displayed in the user’s browser on a web page
that contains HTML hyperlinks to documents on the server that match the search
criteria. Each page displays 10 records (hits) by default, which are sorted
in descending order based on relevance. Each record lists information such
as file name, size, date of creation, and so on. The matched words are also
highlighted.
Customizing Search Pages
Sun Java System Web Server includes a default search application that
provides basic search query and search results pages. These web pages can
be used as is, or customized to meet your specific needs. Such customizing
might be as simple as re-branding the web pages with a different logo, or
as complex as changing the order in which search results are displayed.
The default search application provides sample JSPs that demonstrate
how to use the search tag libraries to build customized search interfaces.
You can take a look at the default search application located at /bin/https/webapps/search as a sample application that illustrates the use of customizable
search tags.
The default search interface consists of four main components: header,
footer, query form, and results.
These basic elements can be easily customized simply by changing the
values of the attributes of the tags. More detailed customizing can be accomplished
using the tag libraries.
Search Interface Components
The Search interface consists of the following components:
Header
The header includes a logo, title, and a short description.
Footer
The footer contains copyright information.
Form
The query form contains a set of check boxes representing search collections,
a query input box, and submit and Help buttons.
Results
The results are listed by default in 10 records per page. For each record,
information such as the title, a passage, size, date of creation, and URL
are displayed. A passage is a short fragment of the page with matched words
highlighted.
Customizing the Search Query Page
The query form contains a list of check boxes for search collections,
a query input box, and submit button. The form is created using the <s1ws:form> tag along with <collElem>, <queryBox>, and <submitButton> tags with default values:
<s1ws:form>
<s1ws:collElem>
<s1ws:queryBox> <s1ws:submitButton>
</s1ws:form>
The query form can be placed anywhere in a page, in the middle, on a
side bar, and so on. It can also be displayed in different formats such as
with a cross bar where the collection select box, the query string input box,
and the Submit button are lined up horizontally, or in a block where the collections
appear as check boxes, and the query input box and Submit button are placed
underneath.
The following examples show how the <searchForm> set
of tags may be used to create query forms in different formats.
In a horizontal bar
The sample code below would create a form with a select box of all collections,
a query input box and a submission button all in one row.
<s1ws:form>
<table cellspacing="0" cellpadding="3" border="0">
<tr class="navBar">
<td class="navBar"><s1ws:collElem type=”select”></td>
<td class="navBar">
<s1ws:querybox size="30">
<s1ws:submitButton class="navBar" style="padding: 0px; margin: 0px; width: 50px">
</td>
</tr>
</table>
</s1ws:form>
In a Sidebar Block
You can create a form block in which form elements are arranged in a
sidebar, and has the title "Search", which uses the same format as other items
on the sidebar. The effect of such an arrangement is as shown in the following
figure:
Customized Query Page with Form Elements in a Sidebar
In the sample code given below, the form body contains three check boxes
arranged in one column listing the available search collections. The query
input box and the Submit button are placed underneath:
<s1ws:searchForm>
<table>
<!--... other sidebar items ... -->
<tr class="Title"><td>Search</td></tr>
<tr class="Body">
<td>
<table cellspacing="0" cellpadding="3" border="0">
<tr class="formBlock">
<td class="formBlock"> <s1ws:collElem type="checkbox" cols="1" values="1,0,1,0" /> </td>
</tr>
<tr class="formBlock">
<td class="formBlock"> <s1ws:querybox size="15" maxlength="50"> </td>
</tr>
<tr class="formBlock">
<td class="formBlock"> <s1ws:submitButton class="navBar" style="padding: 0px; margin: 0px; width: 50px"> </td>
</tr>
</table>
</td>
</tr>
</table>
</s1ws:searchForm>
Customizing the Search Results Page
Search results are generated as follows:
-
The <formAction> tag retrieves values
from all of the form elements and conducts basic validations.
-
The <search> tag, the <resultIteration> tag and other tags occur inside the <formAction> tag
and have access to the values of all of the form elements.
-
The <search> tag executes the search
with the query string and collections from the <formAction> and
saves the search results in pageContext.
-
The <resultIteration> tag then retrieves
and iterates through the result set.
You can customize the search results page simply by changing the attribute
values of the tags.
The following sample code starts with a title bar, and then displays
a number of records as specified, and finally, a navigation bar. The title
bar contains the query string used in the search along with the range of total
records returned, for example, 1– 10. For each record, the records section
shows the title with a link to the file, up to three passages with keywords
highlighted, the URL, the date of creation, and the size of the document.
At the end of the section, the navigation bar provides links to the
previous and next pages, as well as direct links to eight additional pages
before and after the current page.
<s1ws:formAction />
<s1ws:formSubmission success="true" >
<s1ws:search scope="page" />
<!--search results-->
(...html omitted...)
<s1ws:resultStat formId="test" type="total" /></b> Results Found, Sorted by Relevance</span></td><td>
<span class="body"><a href="/search/search.jsp?">Sort by Date</a></span></td>
<td align="right"><span class="body">
<s1ws:resultNav formId="test" type="previous" caption="<img border=0 src=\\"images/arrow-left.gif\\" alt=\\"Previous\\">" />
<s1ws:resultStat formId="test" type="range" />
<s1ws:resultNav formId="test" type="next" caption="<img border=0 src=\\"images/arrow-right.gif\\" alt=\\"Next\\">" />
<!img alt="Next" src="images/arrow-right.gif" border="0" WIDTH="13" HEIGHT="9">
(...html omitted...)
<table border=0>
<s1ws:resultIteration formId="test" start="1" results="15">
<tr class=body>
<td valign=top>
<s1ws:item property=’number’ />.
</td>
<td>
<b><a href="<s1ws:item property=’url’ />"><s1ws:item property=’title’ /></a></b>
<br>
<s1ws:item property=’passages’ />
<font color="#999999" size="-2">
<s1ws:item property=’url’ /> -
<s1ws:item property=’date’ /> -
<s1ws:item property=’size’ /> KB
</font><br><br>
</td>
</tr>
</s1ws:resultIteration>
</table>
(...html omitted...)
<s1ws:resultNav formId="test" type="previous" />
<s1ws:resultNav formId="test" type="full" offset="8" />
<s1ws:resultNav formId="test" type="next" />
(...html omitted...)
</s1ws:formSubmission>
The following figure shows the customized search results page:
Customized Search Results Page
The basic search result interface can be easily customized by manipulating
the tags and modifying the HTMLs. For example, the navigation bar may be copied
and placed before the search results. Users may also choose to show or not
show any of the properties for a search record.
Besides being used along with a form, the <search>, <resultIterate> and related tags may be used to listed specific
topics. The following sample code lists the top ten articles on Java Web Services
on a site:
<s1ws:search collection="Articles" query="Java Web Services" />
<table cellspacing="0" cellpadding="3" border="0">
<tr class="Title"><td>Java Web Services</td></tr>
</table>
<table cellspacing="0" cellpadding="3" border="0">
<s1ws:resultIteration>
<tr>
<td><a href="<s1ws:item property=’URL’ />"> <s1ws:item property=’Title’/></a></td>
</tr>
</s1ws:resultIteration>
</table>
Customizing Form and Results in Separate Pages
If you need the form and results pages to be separate, you must create
the form page using the <form> set of tags and the results
pages using the <formAction> set of tags.
A link to the form page needs to be added in the results page for a
smooth flow of pages.
Tag Conventions
Note the following tag conventions:
-
Classes for tags belong to the package com.sun.web.search.taglibs.
-
All the pageContext attributes have the
prefix com.sun.web. The attribute for search result for
example, is com.sun.web.searchresults.form_id where form_id is the name of the form.
-
Tag libraries are referenced with the prefix s1ws.
Names of tags and their attributes are in mixed case with the first letter
of each internal word capitalized, for example, pageContext.
Tag Specifications
Sun Java System Web Server includes a set of JSP tags that can be used
to customize the search query and search results pages in the search interface.
For a complete list of JSP tags that you can use to customize your search
pages, refer to the Sun Java System Web Server 7.0 Developer’s
Guide to Web Applications.