TNS Internal:NDR/gSearch

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Overview of gSearch for NDR

References

  • subsecond delay between changes to fedora and updates to the index
  • primarily used for full stream datastream indexer


Constructing the Search Document

Code Location: CVS/repository/NDRgenericSearch

  • code from fedora
  • stripped down
    • comes with four configurations
    • can administer simultaneously (not needed)
  • designed to be active web project (plugin)
    • look at .settings (sets up active web project)
    • no ant file


Directories/Files of Interest

  • README (at root of project) - written by Tim
    • lots of comments about what was changed
  • /src
    • location of java source code from fedora
  • /WebContent
    • web
  • /nsdlMods/java/dk/defxws/fgslucene/IndexDocumentHandler
    • changed IndexDocumentHandler class to accept structured xml
    • gsearch was taking this and trying to parse it and would die
    • now passes on the structured xml as a field value, thus putting the xml itself in the index
  • /WebContent/WEB-INF/classes/config/index/NDRFoxmlToLucene.xslt
    • takes Fedora object and turns it into an index object that goes into lucene
    • each fedora object document becomes an index object document
    • look for <IndexField> which defines fields
    • look for <xsl:variable> which defines getting info from other objects to include in this object
  • /WebContent/WEB-INF/classes/config/index/basicIndex/GFindObjectsToResultPage.xslt
    • added a namespace to the output stream


Focuses on metadata

  • does all relsext for all object types
  • processes metadata datastreams for metadata object types
  • commented out getting
  • does not index ncs:collectRecord


UI

URL: http://ndr.nsdlib.org:8580/ndrsearch/rest?operation=browseIndex

  • problem with namespace change (See README)
  • Field name: droplist will show all fields that have been indexed


Integrate with NDR API

  • take gsearch output, stuffs it into jaxb objects, and then jaxb uses a different schema for producing the output
  • http call to get the gsearch document
  • gfind api call (gsearch api)
  • getting stored copy from lucene (as opposed to getting anything directly from NDR)
  • constructed object in gsearch to construct a collection object that can be returned as a whole


gSearch API

  • couldn't find any documentation at fedora-commons.org
  • look at rest call from UI
  • gfind is the api call
  • pass in lucene query to gfind
Example:

  Find all metadata objects returning 10 at a time...
  
  http://ndr.nsdlib.org:8580/ndrsearch/rest?operation=gfindObjects
    &restXslt=copyXml&query=re.nsdl.objectType:%22Metadata%22
    &hitPageSize=10&snippetsMax=0&fieldMaxLength=0
Personal tools