Contained WithinFind More DocumentationFeatured Support Resources | Scarica il manuale in formato PDF (1107 KB)
Chapter 19 Writing New Robot Application FunctionsThis chapter contains the following sections: Overview of Writing Robot Application FunctionsWhen you write robot application functions, make sure that the file that defines your robot application functions includes robotapi.h. You will also find many useful functions in csinfo.h. All Robot Application Functions use parameter blocks, (pblocks) to receive and set parameter values. A parameter block stores parameters as name-value pairs. A parameter block is a hash table that is keyed on the name portion of each parameter it contains. RAF PrototypeAll robot application functions have the following prototype: int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr); pb is the parameter block containing the parameters for this function invocation. csf is the pointer to an enumeration or generation filter. Note – The pb parameter is read-only, and any data modification should be performed on copies of the data. Doing otherwise is unsafe in threaded server architectures and will yield unpredictable results in multiprocess server architectures. Writing Functions for Specific DirectivesYou should write each function for a particular stage in the filtering process, (setup, metadata, data, enumeration, generation, and shutdown.) The function should only use the data sources that are available at the relevant stage. See the section Sources and Destinations (in the Administration Guide) for a list of the data sources available at each stage. At the Setup stage, the filter is preparing for setup and cannot get information about the resource’s URL or content. At the MetaData stage, the robot has encountered a URL for a resource but has not downloaded the resource’s content. Consequently, information is available about the URL and the data that is derived from other sources such as the filter.conf file. At this stage, information is not available about the content of the resource. At the Data stage, the robot has downloaded the content of the URL, so information is available about the content, such as the description, the author, and so on. At the Enumeration and Generation stages, the same data sources are available as for the Data stage. At the Shutdown stage, the filter has completed its processes and shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, shutdown functions typically restrict their operations to shutdown and clean up activities. Passing Parameters to Robot Application FunctionsYou must use parameter blocks (pblocks) to pass arguments into Robot Application Functions and to extract data from them. For example, the following directive (in the filter.conf file) invokes the filter-by-exact function. Data fn=filter-by-exact src=type deny=text/plain The fn parameter indicates the function to invoke, which in this case is filter-by-exact. The src and deny arguments are parameters used with the function. They will be passed to the function in a parameter block, and the function should be defined to extract its parameters and their values from the parameter block. The three structures that are used to hold parameters are libcs_pb_param, libcs_pb_entry, and libcs_pblock. These structures are defined in the header file PortalServer-base/sdk/robot/include/libcs/pblock.h file.
Working with Parameter BlocksA parameter block stores parameters and values as name/value pairs. There are many pre-defined functions you can use to work with parameter blocks, to extract parameter values, to change parameter values, and so on. For example, libcs_pblock_findval(paramname, returnPblock) uses the given return pblock to return the value of the named parameter in the RAF’s input pblock. For an example, see RAF Definition Example. When adding, removing, editing, and creating name-value pairs for parameters, your robot application functions can use the functions in the pblock.h header file (in PortalServer-base/sdk/robot/include/libcs directory). The names of these functions are all prefixed by libcs_. The following table contains the parameter manipulation functions and a description of the corresponding function. See the PortalServer-base/sdk/robot/include/libcs/pblock.h header file for full function signatures with return type and arguments.
Getting Information on the Processed ResourceAs mentioned in RAF Prototype, the prototype for all robot application functions is in the following format: int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr); where csr is a data structure that contains information about the resource being processed. The CSResource structure is defined in the header file robotapi.h. This structure contains information about the resource being processed. Each resource is in SOIF syntax. Objects in SOIF syntax have a schema name, an associated URL, and a set of attribute-value pairs. In the Getting Information on the Processed Resource, the schema name is @DOCUMENT, the URL is: http://developer.siroe.com/docs/manuals/htmlguid/index.htm, and the SOIF contains attribute-value pairs for title, author, and description. Example 19–1 SOIF Syntax Example
A CSResource structure has a url field, which contains the URL for the SOIF. It also has an rd field, whose value is the SOIF for the resource. Once you get the SOIF for the resource, you can use the functions for working with SOIF that are defined in PortalServer-base/sdk/rdm/include/soif.h file to get more information about the resource. (The file robotapi.h includes soif.h.) For example, the macro SOIF_Findval(soif, attribute) gets the value of the given attribute in the given SOIF. Getting Information on the Processed Resource uses this macro to print the value of the META attribute if it exists for the resource being processed. Example 19–2 SOIF_Findval Macro Example
It is recommended that you review the CSResource structure in the file robotapi.h for more information on other fields and macros. For more information about the routines to use with SOIF objects, see Memory Buffer Management. Returning a Response Status CodeWhen your robot application function has finished processing, it must return a code that tells the server how to proceed with the request. These codes are defined in the header file PortalServer-base/sdk/robot/include/robotoapi.h. The following list describes the response status codes after the robot has completed processing and a description of the corresponding status code:
Reporting Errors to the Robot Log FileWhen problems occur, robot application functions should return an appropriate response status code (such as REQ_ABORTED), and they should also log an error in the error log file. To use the error-logging functionality, you must include the file log.h in the PortalServer-base/sdk/robot/include/libcs directory. After you have ensured that log.h exists in the correct place, you can use the cslog_error macro to report errors. The prototype is in the following format: cslog_error(int n, int loglevel, char* errorMessage) The first parameter is not currently used (may be used in the future) You can pass this as any integer. The second parameter is the log level. When the log level is less than or equal to the log level setting in the file process.conf, the error message is written in the robot.log. The third parameter is the error message to print, and it has the same form as the argument to the standard printf() function. For example:
This invocation of cslog_error would generate the following error message in the robot log file:
For another example:
This invocation of cslog_error would generate the following error message in the robot log file:
RAF Definition ExampleThis section shows an example definition for a robot application function. This function copies a specified source data to a multi-valued field in an RD. For example, the search engine stores category or classification information in the classification field of an RD. The copy_mv function allows the robot to get the value of an HTML <META> tag of any name and store the value in the classification field in the database. For example, using this function, you could instruct the robot to get the content of the <META NAME="topic"> tag, and store it as the classification of the resource. You would invoke this function with a directive such as the following: Generate fn=copy_mv src=topic dst=classification RAF Definition Example shows a sample function definition. Example 19–3 Robot Application Function Example
Compiling and Linking your CodeYou can compile your code with any ANSI C compiler. See the makefile in the PortalServer-base/sdk/robot/example directory for an example. The makefile assumes the use of gmake. This section lists the linking options you need to use to create a shared object that the robot can be instructed to load by commands in the filter.conf configuration file. Note that you can link object files into a shared object. In Table 19–1, the compiled object files t.o and u.o are linked to form a shared object called test.so. Table 19–1 Options for linking
Loading Your Shared ObjectThe robot uses the filters defined in filter.conf to filter resources that it encounters. If the file filter.conf uses your customized robot application functions, it must load the shared object that contains the functions. To load the shared object, add a line to filter.conf: Init fn=load-modules shlib=[path]filename.so funcs="function1,function2,...,functionN" This initialization function opens the given shared object file and loads the functions function1, function2, and so on. You can then use the functions function1 and function2 in the robot configuration file (filter.conf). Remember to use the functions only with the directives you wrote them for, as described in the following section. Using your New Robot Application FunctionsWhen you have compiled and arranged for the loading of your functions, you need to provide for their execution. All functions are called as follows: Directive fn=function [name1=value1] ... [nameN=valueN]
These two parameters are mandatory. In addition, there may be an arbitrary number of function-specific parameters, each of which is a name-value pair. You will need to specify your function in the directive for which it was written. For example, the following line uses a plug-in function called word count that can be used in the Data stage. This function counts the words in a resource and assigns the count to a destination specified by a parameter called dst. Data fn=wordcount dst=word-count |
||||||||||||||