Chapter 27 Using Java To Add Entries to the Search Engine
Database
The program rdmgr is used to add data to the database from the
command line. This chapter describes how to create input data for rdmgr so
that it can be added to the database.
rdmgr Command
The rdmgr utility can add new data as well as replace, modify,
or retrieve existing data. All data input and output is done using SOIF, with UTF-8
character encoding for character fields. Note that SOIF also supports binary-valued
fields and they can be added or retrieved too.
For more information on rdmgr, see Technical Reference Guide.
In the simplest case, rdmgr can be used to add a file containing
multiple SOIF Resource Descriptions (RDs) to the database. This is as simple as creating
a SOIF file with the search sdk, or by other means, and adding the data with the command rdmgr soif_input_file. The rdmgr also accepts resource
description submit requests as input. Submit requests also use SOIF format and include
a request header in addition to the normal body consisting of the SOIF data to be
added or retrieved to or from the database.
SOIF Object
A SOIF object consists of a schema name (such as REQUEST or DOCUMENT), a
URL, and a list of attribute-value pairs. The com.sun.portal.search.soif package in the Search Server Java SDK is used to build SOIF objects and
write them to a file.
Constructing and Submitting a Request
You can use the SOIF classes to create a RD submit request for input to rdmgr.
Constructing a Request
Here is an example of constructing a submit request that can be used as input
to rdmgr. Request headers do not have an associated URL and use "-" instead.
SOIF req = new SOIF("REQUEST", "-");
|
A submit request can have the following attributes:
submit-csid
submit-database
submit-type
submit-operation
submit-view
|
Add values for each of these attributes to the request header. This example
shows an update operation into the default database. The database attribute is optional,
the default database is used if none is supplied. The submit view restricts which
attributes are updated, by default all of the supplied input attributes will be updated
for each resource description.
req.insert("submit-database", "default");
req.insert("submit-type", "nonpersistent");
req.insert("submit-operation", "update");
req.insert("submit-view", "title,author,description");
|
Now we create the body part of the submit request. We’ll be updating the
resource description of a document, whose URL is http://www.sesta.com/~jocelyn/resdogs.index.htm, whose title is “Saving English Springer Spaniels,” whose author
is Jocelyn Becker, and whose description is “English Springer Spaniels in need
of homes.”
SOIF data = new SOIF("DOCUMENT", "http://www.sesta.com/~jocelyn/resdogs.index.htm\\n");
data.insert("title", "Saving English Springer Spaniels");
data.insert("author", "Jocelyn Becker");
data.insert("description", "English Springer Spaniels in need of homes");
|
Now, the request is saved to a file for input to rdmgr:
SOIFOutputStream sos = new SOIFOutputStream("soif_file");
sos.write(req);
sos.write(data);
sos.close();
|
At this point soif_file should contain:
@REQUEST { -
submit-database{7}: default
submit-type{13}: nonpersistent
submit-operation{6}: update
submit-view{24}: title,author,description
}
@DOCUMENT { http://www.best.com/~jocelyn/resdogs/index.html
title{32}: Saving English Springer Spaniels
author{14}: Jocelyn Becker
description{42}: English Springer Spaniels in need of homes
}
|
Submitting a Request
When this input is processed by rdmgr, it will result in
the attributes of the RD shown being updated to the database and indexed. The rdmgr utility supports other types of requests too.
Example 27–1 rdmgr Submit
// submit header fields
String SUBMIT_CSID = "submit-csid";
String SUBMIT_TYPE = "submit-type";
String SUBMIT_OPER = "submit-operation";
String SUBMIT_VIEW = "submit-view";
String SUBMIT_DB = "submit-database";
String SUBMIT_MESSAGE = "message";
String SUBMIT_ERROR = "error";
// submit operations
String SUBMIT_RETRIEVE = "retrieve";
String SUBMIT_INSERT = "insert";
String SUBMIT_DELETE = "delete";
String SUBMIT_UPDATE = "update";
// submit types
String SUBMIT_PERSISTENT = "persistent";
String SUBMIT_NONPERSISTENT = "nonpersistent";
String SUBMIT_MERGED = "merged";
|
Submit Operations
The submit operations are as follows:
-
retrieve
-
Retrieves the requested fields (the submit view) for the requested
RDs. In this case the data is a list of RDs that can be specified by their URLs only.
The server will return the requested fields for these RDs.
-
insert
-
The server adds or replaces the RDs supplied to the database.
-
delete
-
The server deletes the RDs. As with retrieve, it is sufficient to
list the RDs by URL alone, it is not necessary to supply values for the fields of
the RDs.
-
update
-
The server modifies the RDs in the database by merging any existing
fields with the fields supplied in the data. If an attribute view list is supplied,
only those attributes will be updated. If a view is not supplied, all of the given
input attributes will be updated for each RD.
Submit Types
The submit types are as follow:
-
persistent
-
The operation is applied to the persistent part of each RD in the
database. When an RD is retrieved from the database, or indexed, any persistent fields
take precedence over non persistent fields. This allows you to manually edit the fields
of an RD without having to worry that your edits will be lost the next time the RD
is submitted by the robot, for example.
-
non-persistent
-
This is the default type. Data is normally added as non-persistent
data. Note that RDs are only indexed and searchable if they have a non-empty non-persistent
component.
-
merged
-
This is the default for retrieval. When data is retrieved, the persistent
and non-persistent fields are merged together, with the persistent fields taking precedence
over the non-persistent fields. You can view this as the persistent fields ”covering’
the non-persistent fields. You can also retrieve just the ”persistent’
fields, or just the ”non persistent’ fields.