Recorder: Lyndsay Greer, ENC/The Ohio State University

Questions for panelists: What metadata standard did you choose and why? What modifications have you made to your chosen standard? What are the costs, software, timelines, and manpower associated with creating digital libraries using the standard?

Session Notes - Implementation of Metadata Standards by Different NSDL-Funded Collections

October 14, 2002

Question to Grace Agnew, Rutgers

If you found something on this (Moving Image Collections) search, you then have to go and contact the archive (to get the video clip)?

No. We haven't loaded National Geographic yet, but they gave us 4,000 outtakes that will be streamable. CNN will also be streamable.

Who will stream it? MIC?

They will stream it themselves, the resource providers will. The archive would provide a link to the source.

Finding video clips is a bit of a needle in a haystack normally. MIC will be the only place you can discover these resources. Currently, very few of them have public catalogs. Until the resources are pulled into the NSDL this (MIC) will be the only place they are available. CNN, for example, doesn't make their archives available to the general public.

Comment

IMS is no longer an acronym; it's just IMS.

Questions to Karon Kelly, University Corporation for Atmospheric Research (UCAR)

Questions to Kim Lightle, Eisenhower National Clearinghouse (ENC) at the Ohio State University

Can you say a little more about the Standards Plus project?

Standards Plus is part of the Ohio Resource Center, which is funded through the Ohio
Board of Regents. ORC is a project to build a collection, a learning object repository, of objects that are correlated to state standards for math, reading, and science.

The Standards Plus project is to start adding career information and other things to each of those standards. There is info about standards plus on the website, http://www.ohiorc.org. Peggy Kasten is the project leader.

Comment about transforming metadata - DLESE went with a semantic schema that was maximized to keep the rich metadata. If you use a semantic registry that's schema independent you tend not to lost so much data.

But if you export, you still lose data.

Only if you export to a standard that holds less information. This way we can export rich metadata to MARC.

But if something has been described with just Dublin Core, the data isn't there to convert to IMS LOM; you'd have to add it.

One of the things DLESE is looking at is normalizing some of our values.
We're tracking schemas and creating placeholder schemas for controlled vocabularies. Are you doing that?

We've thought about it.

DLESE would like to normalize roles - director vs. direction, etc.

We need to make the distinction between archives and libraries. We should be streamlining the metadata. Im sitting on an 80 yr old museum. Just last year we sent out a team to photograph, and you can't see them in the wild any more, only in museums. There are certain areas of the metadata that should be archived with those specimens, because that's the only record of them.

ENC hasn't delved into archiving. Though there are people who are working on it. The San Diego Supercomputer Center, if your metadata is being ingested, Reagan Moore has a spider that will crawl your metadata. It will capture the content and put it on the servers. Not sure if it's archiving the rich materials, or just the metadata. So the question becomes will the NSDL be responsible for your archiving, or will you do it yourself. There are no good archiving stories yet, nobody has done a good job.

My point wasn't where we store it, but that if we move too hard towards applying the stuff, move away from applying the metadata, we don't want to forget to store the stuff.

Archives create very rich metadata, 900 fields. Records 3-4 screens long, if not 9-10 screens long. The metadata is critical to recreation. Build a solid metadata foundation. Archives are concerned with losing too much richness. Phase II will bring in the preservation aspects.

It costs so much to fill in metadata. If it's there you should keep it, but if not, you have a limited amount of money to spend on metadata creation; if people aren't using fields...

Preservation of content through metadata is being done through the archival community, and is recreation centered rather than user centered. Its cognizant of the transitional nature of what we're doing today, and designed to support the migration of content. NSDLs archiving efforts are fairly primitive at this point, with the notion of being able to capture stuff so that you can still get to it if the server is down.

Everybody talks about descriptive metadata as the end all be all, but you also need structural and administrative metadata. You have to delve deep into many kinds of metadata that weren't discussed in the presentations.

There area number of organizations whose sole purpose is archiving, and that might be a whole nother area of information.

Weve been doing sounds for 75 years, but to make the metadata more useful we have all the classic descriptive fields, but now we have fields that must be added.

New projects coming to this wondering what kind of metadata they need, may make the mistake of slavishly following a standard, rather than figuring out what kinds of metadata you need to do what you want to do. Make sure that you cover that. Internally you probably have a much richer metadata standard of your own, and that's fine, because you really only have to do standards when you talk to someone else.

There are tradeoffs. You have to make decisions. We don't focus on helping people make decisions in what to capture in recording one digital resource over another. One option that I think we looked at very early on, since looking at the standards is daunting, we decided that there would be a minimum set of metadata that would be applied to every object that comes into the collection. This is a critical part of user comprehension of a resource. We decided this minimum set of ten fields. Plus we decided on required, recommended, and robust fields, thinking that users would be eager to robustly describe these resources. But it turns out were lucky to get even the required fields. But what you really want to capture is the robust metadata for archival purposes. Look at what the minimal set is, then look at the other metadata you'd like to capture, in ways that users are eager to supply.

We have a different problem; we're dealing with people who are eagerly providing metadata. The metadata was too complex to give to end users, and must be translated for the general users.

Can a user find something? Can they interpret it? Can they obtain information once they find the item? Can they select one resource over another? PhD grad student will test people on this, on selecting the best record for a given situation, and obtain the resource. We didn't find robust metadata assessment.

You have to choose the minimum set that you can get by with. Luck to get authors to select a few keywords to describe what they're doing. You should construct your metadata so that it works for you.

Let the users tell us. If you had to select between 2 resources, what would the most critical data elements be? The results were consistent.

The other side of that is how is our metadata going to work with everyone else? We developed a controlled vocabulary in hopes that other people will adopt it, our subject areas. We want people to either adopt it, or tell us how to do it. Its posted on the web and will be discussed tomorrow.

Do you relate your vocab to any other vocabulary?

Externally we don't. Journal of chem ed keyword list.

Are there definitions or semantics?

They have just been added and will be coming online soon.

These are reusable.

We are using ISO 11179, and it would be useful if all the NSDL projects would map to
ISO.

Dublin Core is doing vocabulary registries. Should be starting in Jan. The tool is developed and the processes are developing. There are concerns about liability, assignment of URI's. Vocabulary owners are going to register themselves and provide their own URI's.

Business of interoperability business is a huge one. If you have homegrown vocabularies, they do need to be registered and crosswalked. If someone is crawling or looking at these records, you can see how they would be crosswalked.

My feeling is that the more you can do more than one vocabulary, then that's something someone else can use to build that kind of relationship later. The notion of being able to apply more than one vocabulary or term makes an enormous amount of difference for what can be built on top of it.

I accumulated lots of different lists of terms.

Need a relationship between semantic structures and terms.

Animal behavior community is getting together and wants to put their data up. Theres been discussion as to whether or not to come up with a metadata standard, an ontology, for animal behavior. Weve been given a small grant to have a couple of meetings. People have come up with ontologies. Is that the way to go?

May be asking the wrong people. Your data model should be based on your user community.

Are the users the people who will be at your meetings?

Dont make assumptions about what your users need; you may turn your assumptions on their heads.
Was told everyone is doing marc, but it turns out that only 10 out of 40 were doing marc. if you observe the users, it turns on your head what you thought you knew about your community. Those interested in metadata aren't the people who are using it.

Kim mentioned users aren't using the advanced search for the most part. why?

Weve talked to users, watched videos of our users. I think a lot of our users are not very sophisticated, and that's part of it. Weve had them go through scenarios, and our basic search allows grade level, web, non-web, and cost. So I think that people don't need to go to the advanced. It depends on what you call your simple search.

Would that argue that you don't need all those fields in your LOM?

They may be using fields once they get to the record, though they aren't searching on them.

What DLESE heard from master teachers is that they're unlikely to search by standards. they're interested in seeing the metadata that asserts that relationship, though. And they might look for like records based on that. Curriculum developers and new service teachers would be likely to search by standards. We need to do some study about what users are looking at when they examine the full metadata record. In our case it's probably description.

Its good to distinguish between searching and filtering.

Were not sure what they're filtering on.

We may never be able to know.

Thats what our assessment project is doing, actually.

Comments

Please enter any comments in the following format.

(commenters' initials) - month/day [comment date]
comment

NSDL thanks DLESE for hosting the swikis for the NSDL Annual Meeting 2003.

Swiki Features