TNS Internal:NDR/gSearch
From NSDLWiki
Contents[hide] |
Overview of gSearch for NDR
References
- subsecond delay between changes to fedora and updates to the index
- primarily used for full stream datastream indexer
Constructing the Search Document
Code Location: CVS/repository/NDRgenericSearch
- code from fedora
- stripped down
- comes with four configurations
- can administer simultaneously (not needed)
- designed to be active web project (plugin)
- look at .settings (sets up active web project)
- no ant file
Directories/Files of Interest
- README (at root of project) - written by Tim
- lots of comments about what was changed
- /src
- location of java source code from fedora
- /WebContent
- web
- /nsdlMods/java/dk/defxws/fgslucene/IndexDocumentHandler
- changed IndexDocumentHandler class to accept structured xml
- gsearch was taking this and trying to parse it and would die
- now passes on the structured xml as a field value, thus putting the xml itself in the index
- /WebContent/WEB-INF/classes/config/index/NDRFoxmlToLucene.xslt
- takes Fedora object and turns it into an index object that goes into lucene
- each fedora object document becomes an index object document
- look for <IndexField> which defines fields
- look for <xsl:variable> which defines getting info from other objects to include in this object
- /WebContent/WEB-INF/classes/config/index/basicIndex/GFindObjectsToResultPage.xslt
- added a namespace to the output stream
Focuses on metadata
- does all relsext for all object types
- processes metadata datastreams for metadata object types
- commented out getting
- does not index ncs:collectRecord
UI
URL: http://ndr.nsdlib.org:8580/ndrsearch/rest?operation=browseIndex
- problem with namespace change (See README)
- Field name: droplist will show all fields that have been indexed
Integrate with NDR API
- take gsearch output, stuffs it into jaxb objects, and then jaxb uses a different schema for producing the output
- http call to get the gsearch document
- gfind api call (gsearch api)
- getting stored copy from lucene (as opposed to getting anything directly from NDR)
- constructed object in gsearch to construct a collection object that can be returned as a whole
gSearch API
- couldn't find any documentation at fedora-commons.org
- look at rest call from UI
- gfind is the api call
- pass in lucene query to gfind
Example: Find all metadata objects returning 10 at a time... http://ndr.nsdlib.org:8580/ndrsearch/rest?operation=gfindObjects &restXslt=copyXml&query=re.nsdl.objectType:%22Metadata%22 &hitPageSize=10&snippetsMax=0&fieldMaxLength=0