Community:OnRamp/documentation/design/caching

From NSDLWiki

Jump to: navigation, search

Caching

Design | Discussion | Status | Testing

Contents

[hide]


Overview

This page describes the proposed design for caching. The basic design includes the following changes...

  • New object type: Cached Result
  • New XSDs in support of the Cached Result object type
  • D_Cache class - registers cache results in the OnFire MySQL db
  • cache and fetch scripts that user calls
  • Processing for standard 'generate' script and 'customResult' scripts.
  • Processing for handling of scheduling types
    • Production vs. Test versions of cached results
    • Auto-caching
  • List results in Browse Destinations
  • UI changes to support destination admin management screens


Object Type: Cached Result

  1. Edit fez_object_type table in Fez MySQL db.
  2. Insert id=7 title=Cached Result


XDSs for Cached Result object type

New XSDs:

XSD Display Comments
FezACML FezACML for Onfire Cached Result Same as other FezACML displays
FezMD FezMD Onfire Cached Result Sets object type=7; otherwise, same as other FezMD displays
RELS-EXT Fedora Onfire Cached Result RELS-EXT Display Establishes an isMemberOf relationship between the Destination and the Cached Result
FedoraObject XML Onfire Cached Result Creates a Fedora Object with the datastreams in this table + others.


D_Cache class

DB Access

The D_Cache class writes to the destination_cache table in the OnFire MySQL db. This table includes the following fields...

Field Description
dc_id auto generated unique id
dc_d_id id for destination for which result is generated
dc_dr_id id for range for which result is generated
dc_script_id id for script that generated the result being cached
dc_uid user specified id for retrieving the cache
dc_pid pid of the object where the cached result is stored
dc_dsid dsid of the datastream where the cached result is stored

Utilities

Several utilities in this class facilitate the caching process, including...

Function Description
generateCacheID($d_id,$dr_id,$script_id,$uid) Based on information unique to a result, generate a unique id for the cache for the result.
cache($result,$cache_id) Create or update a cache results.
findCache($d_id,$cache_id) Find a cache location if the cache already exists. Returns cache location (pid and dsid) if found. Returns null if not found.
createCacheObject() Creates an object of the Cache Result type. Perhaps pass in some setup info. Returns pid of new object.
addCacheResult($result,$cache_id,$pid) Adds datastream holding cached result. Returns dsid.
updateCacheResult($result,$cache_id,$pid,$dsid) Updates the datastream holding cached result.
registerCacheResult($cache_id) Decipher the cache id and register the cached result in destination_cache table,
queryCache($d_id,$dr_id,$script_id,$uid) Returns an array of cache information (cache_id, pid, dsid, etc.) for all cache results matching the passed in information. The only required parameter is $d_id. Certain parameters may require other parameters to be specified.


Cache and Fetch scripts

Cache

Examples:  
  1) cache.php?destination=bppb&uid=November2008
  2) cache.php?destination=bppb&uid=November2008&script=browse
  3) cache.php?destination=ncore&uid=ncore_web&test
  4) cache.php?destination=nsdlheadlines&date=11/03/2008
  • process url and parameters
    • uses the destination parameter to get information like d_id, scheduling type, etc.
    • uses uid to get information like range_id, end_date, etc.
    • default script_id = 'generate'
    • default date = today
  • generates the results based on passed in parameters and values from processing parameters
  • calls D_cache::generateCacheID()
  • calls D_cache::cache()

NOTE: The cache.php script accepts a test parameter. The effect of this parameter is dependent on the scheduling type of the destination. See Processing of Handling Scheduling Types for more information.

Fetch

Example:  
  1) fetch.php?destination=bppb&uid=November2008
  2) fetch.php?destination=bppb&uid=November2008&script=browse
  3) fetch.php?destination=ncore&uid=ncore_web&test
  4) fetch.php?destination=nsdlheadlines&date=11/03/2008
  • process url and parameters
    • uses the destination parameter to get information like d_id, scheduling type, etc.
    • uses uid to get information like range_id, end_date, etc.
    • default script_id = 'generate'
    • default date = today
  • calls D_cache::generateCacheID()
  • calls D_cache::queryCache()
  • use results of queryCache to generate and forward to the result's eserv URL

NOTE: The fetch.php script accepts a test parameter. The effect of this parameter is dependent on the scheduling type of the destination. See Processing of Handling Scheduling Types for more information.



Processing for Standard 'generate' Script and 'customResult' Scripts

  • Script id will be encoded in the cache_id.
  • Script id will be stored in the destination_cache table.
  • Some custom scripts use a different scheduling type than the base 'generate' script.
  • Extend onfire_template_behaviors to hold mimetype of result and extension to use with the result.
Example:
  Destination 'Beyond Penguins and Polar Bears' specifies range of dates for issues.  

  The 'generate' script operates on one issue at a time.  
    Scheduling Type: 'Destination Specifies Range of Dates'

  A custom result script 'browse' allows for browsing across issues and operates on all issues.  
    Scheduling Type: 'All Distributions'

In this specific example, it would be ok to cache the customResult per range. The latest range would hold the actual cached result that is valid. An advantage of this would be that it would be easy to look backward to see what the cached result used to be. And in this specific use case, it allows for testing of the new browse while still accessing the older browse that was valid during the previous issue.

Processing for Handling of Scheduling Types

Scheduling Types

    Distribution
ID Scheduling Type hasRangeID hasReleaseDate hasEndDate
0 Destination specifies range of dates YES YES NO
1 Distribution specifies range of dates NO YES YES
2 All Distributions NO YES Optional
3 One Distribution NO YES NO

Processing for each type proceeds as follows.

Destination Specifies Range of Dates

Standard 'generate' processing...

  • There is one cache object per range. Each object holds the cached results for a range.
  • Re-caching
    • Re-caching the result for a range replaces the previous result for that range.
  • Auto-caching:
    • Any time a new distribution is published, the cache of the range that includes the distribution is regenerated.
    • Any time a distribution is modified, the cache of the range that includes the distribution is regenerated.
  • Production vs. Test - There is no difference in the way results are generated between production and test. Conceptually, a result is considered production or test based on the following criteria.
    • Production Result: Any range whose range (end) date has passed, including today. Each range's result is a separate production result.
    • Test Result: Any range whose range (end) date is later than the current date. Each range's result in this case is a test result.
  • UI
    • Destination Management screen:
      • Auto-Caching: New configuration field for the destination (default=off)
      • In list of destinations, when the destination is of this scheduling type, the Manage link in the new Results column takes you to a page to manage results with fields...
        • list all ranges where each line has...
          • Range Title
          • select list of scripts
          • Execute link - execute calls precache.php which brings up a popup with a message to be patient and then forwards to cache.php
          • View link - view calls fetch.php and opens the results in a new window
    • Browse Results screen:
      • List all results sorted by range end date
      • Name: Range Title, Range UID OR Range Title, Range UID -- Script UID
      • Clicking a result calls fetch.php and opens the results in a new window
  • Fetching by URL:
fetch.php?did=d_id&rid=dr_id
fetch.php?did=33&rid=46

fetch.php?did=d_id&rid=dr_id&behid=dtbeh_id
fetch.php?did=33&rid=46&dtbeh_id=14

fetch.php?did=d_id&rid=dr_id&behid=dtbeh_id&test
fetch.php?did=33&rid=46&dtbeh_id=14&test
* NOTE: 'test' parameter is required if end date for requested range is later than today

Distribution Specifies Range of Dates

Standard 'generate' processing...

  • There is one cache object per date (one for each day).
  • Re-caching
    • Re-caching the result for a date replaces the previous result for that date.
  • Auto-caching:
    • Any time a new distribution is published, the cache for all dates within the range of dates of the distribution are regenerated.
    • Any time a distribution is modified, the cache for all dates within the old range of dates and the new range of dates of the distribution are regenerated.
  • Production vs. Test - There is no difference in the way results are generated between production and test. Conceptually, a result is considered production or test based on the following criteria.
    • Production Result: All days that have already passed, including today, can have a cache result generated for that day.
    • Test Result: Any day that is later than the current date. Each day can have a cache result generated for that day.
  • UI
    • Destination Management screen:
      • Auto-Caching: New configuration field for the destination (default=off)
      • In list of destinations, when the destination is of this scheduling type, the Manage link in the new Results column takes you to a page to manage results with fields...
        • date field - allows specification of which date's cache is being generated
        • select list of scripts
        • Execute link - execute calls precache.php which brings up a popup with a message to be patient and then forwards to cache.php
        • View link - view calls fetch.php and opens the results in a new window
    • Browse Results screen:
      • List all results sorted by date field which is the result_id in the cache table
      • Name: Destination Title, Date OR Destination Title, Date -- Script UID
      • Clicking a result calls fetch.php and opens the results in a new window
  • Fetching by URL: (today if date is missing or blank)
fetch.php?did=d_id
fetch.php?did=37

fetch.php?did=d_id&date=mm/dd/yyyy
fetch.php?did=37&date=10/15/2008

fetch.php?did=d_id&date=mm/dd/yyyy&test
fetch.php?did=37&date=01/15/2009

* NOTE: 'test' parameter is required if date is later than today

All Distributions

Standard 'generate' processing...

  • There is one object with two datastreams, production and test, that hold the production result and the test result, respectively.
  • Re-caching
    • Re-caching of production replaces the production datastream.
    • Re-caching of test replaces the test datastream.
  • Auto-caching:
    • Any time a new distribution is published, the cache is regenerated.
    • Any time a distribution is modified, the cache is regenerated.
  • Production vs. Test - Production and test results are generated by selecting one or the other to generate. The contents of the results of the two modes are based on the following criteria.
    • Production Result: A single cache exists that includes all distributions upto and including today's date.
    • Test Result: A second cache exists that includes all distributions regardless of release date.
  • UI
    • Destination Management screen:
      • Auto-Caching: New configuration field for the destination (default=off)
      • In list of destinations, when the destination is of this scheduling type, the Manage link in the new Results column takes you to a page to manage results with fields...
        • two lines, one for Production and one for Test
          • either Production or Test
          • select list of scripts
          • Execute link - execute calls precache.php which brings up a popup with a message to be patient and then forwards to cache.php
          • View link - view calls fetch.php and opens the results in a new window
    • Browse Results screen:
      • List two items: production result and test result
      • Name: Destination Title (Production) OR Destination Title (Test) OR Destination Title (Production|Test) -- Script UID
      • Clicking a result calls fetch.php and opens the results in a new window
  • Fetching by URL:
fetch.php?did=d_id
fetch.php?did=34

fetch.php?did=d_id&test
fetch.php?did=34&test


One Distribution

There isn't a use case for this yet, so it will not be implemented at this time. But here is what it will most likely look like.


Standard 'generate' processing...

  • There is one object per distribution. Each object holds the cached results for a single distribution.
  • Re-caching
    • Re-caching the result for a particular distribution will replace the previous result for that distribution.
  • Auto-caching:
    • If auto-caching is turned ON, re-generate the cache for a distribution any time the distribution is updated.
  • Production vs. Test - There is no difference in the way results are generated between production and test. Conceptually, a result is considered production or test based on the following criteria.
    • Production Result: For each distribution whose release date has already passed, including today, the cache result is considered production.
    • Test Result: Any distribution with a release date later than the current date, the cache result is considered test.
  • UI
    • TBA

List Results in Browse Destinations

Under Browse Destinations, you can select to Browse Results. This will present the user with a list of all cached results for this destination. Specifics related to the scheduling types of destinations is described in the Processing of Handling Scheduling Types section.

Destination Admin UI

This section includes a general description of the sections of the administration screens that destination admins will have access to and what changes are expected for those screens to accomodate the new caching design. Specifics related to the scheduling types of destinations is described in the Processing of Handling Scheduling Types section.

Identifying Destination Admin

Originally it was invisioned that a new function would be writen to identify if the current user is assigned as Destination Admin for any destination. This was actually implemented by setting Destination Admins to be OnRamp admins. OnRamp admins can perform the following administrative tasks...

  • Manage Groups
    • Create, modify, and delete any group
    • Add and remove users from any group.
  • Manage Authors (effects all OnRamp)
  • Manage Destinations (limited to destinations where the admin is assigned as Destination Admin)
    • Edit Configuration
    • Manage Results
    • Create, modify, and delete basic information about destinations (Creating a destination does not automatically assign the user as Destination Admin, so new destination won't show up on their list. Probably should change that.)
  • Manage Destination Ranges (limited to destinations where the admin is assigned as Destination Admin)
    • Create, modify, and delete ranges
    • Configure ranges


UI Changes

Updates to administration UI to allow Destination Admins to perform administrative tasks for destinations.

  • manage ranges
    • add new range (aka issue)
    • modify range configuration
  • manage destinations
    • edit content definitions
    • edit post processing
    • edit configuration
    • manage results (specifics of this screen are defined under each scheduling type)
      • run 'generate'
        • production mode
        • test mode
      • run 'customResult'
        • select which custom script to run
        • production mode
        • test mode
      • fetch results
Personal tools