|
Nova Scotia's
Geographic Information
Standards
Chapter 4
Data Quality and Accuracy
(Continued)
4.12 Theme and Data Set Lineage
Material presently before the
Database Directory and Catalgoue Task Group
for final review
4.13 Feature Cataloging
4.13.1 Background:
Within the GIS environment, the feature code is the one of the basic tags
which is used to describe the make-up of the feature type being presented
in the digital file. Such tags can have different meanings for different
users. For instance, if a feature is tagged with the feature type - ROAD,
that tag could be a gravel road, a paved road, a bumpy road or a six lane
super highway. The feature definition for ROAD is the primary mechanism used
to describe exactly what type of road is being depicted.
For a given data set, all feature types should be described. Such
descriptions provide critical information for the user as to how they might
be expected to use (or not use) the data. A feature catalogue can bring all
of these feature descriptions together.
4.13.2 Standard: Feature Cataloguing
All data providers will supply their clients with a Feature Catalogue
depicting the contents of the data for which they are responsible. The
presentation of the Feature Catalogue will follow the same structure as
that presented for the Primary Data Products of the Province of Nova Scotia
(See Chapter 6).
4.14 Topic: Digital Feature Cataloguing
4.14.1 Background:
With the data provider having a standard feature catalogue the need to supply
that catalogue in an effective manner is required. Data providers may elect
to give their clients periodic updates to their catalogue in either hardcopy
or digital format. Such updates however do not allow the user to see changes
or additions to the feature lists. As an alternative, some data providers
have elected to provide their clients with listings of feature types with
each data set obtained. This too has its limitations as it does not provide
an accompanying descriptive. A more appropriate alternative would be to
provide the client with a digital copy of the feature catalogue each time
data is obtained by a client.
4.14.2 Standard: Client Copies of Feature Catalogues
All data providers will supply their clients with digital copies of their
feature catalogue with each product procurement.
4.15 Completeness
4.15.1 Introduction
Completeness may be defined as relating to or dealing with the assessment of a data sets content
and/or coverage. Completeness documentation might include an assessment of content (eg what
percentage of the data fields within a given record type actually are populated with valid data) and
coverage (ie a definition of the required spatial domain for the dataset).
There are two types of completeness associated with geographic data. There is spatial
completeness which refers to the degree to which geographic data covers a geographic region.
And there is content completeness which refers to the degree to which graphic data is meaningful.
Documenting data completeness, whether spatial or content, can be associated with varying levels
of detail. Five levels of data must be considered in relation to completeness:
The data / map series - a group of map sheets or data sets having the same scale and
cartographic specifications and collectively identified by the producer.
Dataset - collection of similar and related data records that are recorded for use by a
computer.
Theme - data having similar characteristics being contained in the same data
set,
Element - a fundamental geographical unit of information, such as a point, line, area, or
pixel. An element does not include attribute information.
Attribute - a defined characteristic or item of information that describes an
element.
In the following sections, guidelines and standards will be outlined indicating the extent to which a
data provider should document data completeness.
4.16 Topic Completeness - Spatial and Content
4.16.1 Purpose:
To provide data consumers with completeness information at all levels. Such information will
allow the user to formulate opinions on the applicability of the data to their given application.
4.16.2 Background:
The issue of data completeness was presented during a 1995 Data Quality and Accuracy
workshop. As a result of user discussions and follow up questionnaires the following
observations were noted :
- The issues surrounding completeness and the need for information regarding completeness is
very important.
- The users believe there is a high correlation between cost of data and completeness of that data.
- Data providers have distributed data to clients before it has been checked, as a result pre-releases have contained little or no information regarding completeness status. Historically this
has been a problem when the user revisits that data.
- Some themes are easier to complete or are more complete than others. As a result theme based
completeness is an issue.
- The majority of the users would like to be notified as to data set status as soon the data is
available. Such notification would allow the user to determine if they should pursue the "new"
information. It was further noted however, that transaction based updates of selected data may
make such notification impossible.
- Attribute, Feature/theme, and dataset completeness were of interest to users.
- There appears to be a distinction between "geographical completeness" and "production
completeness".
4.16.2.1 Spatial Completeness
Spatial completeness refers to the degree to which geographic data covers a geographic region.
At certain levels of data detail, completeness statements are not necessary, as the existence of the
feature is intuitive (i.e. it is there or it isn't). Such is the case for elements. And if the element is
non existent, then spatial attributes for that element are also non existent.
Spatial completeness at the theme level refers to the number of captured elements based upon the
actual number of elements for a given theme. At the dataset level spatial completeness refers to
the number of captured themes based upon the actual number of themes. For example, in the
primary database there are 10 primary themes. If a given dataset only has 8 of those ten collected,
the dataset is only 80% complete. At the data / map series level spatial completeness is based
upon the number of data sets completed out of the actual number of data sets in the series. Again
by way of an example, there are 1587 datasets (map sheets) at a scale of 1:10 000 which make up
a portion of the NSTDB, at the end of 1997 this portion of the NSTDB is scheduled to be 100%
complete (initial lift only).
4.16.2.2 Standard: Spatial Completeness
Data providers must supply data completeness information pertaining to the data's spatial
details. These spatial completeness statements will reflect data at the Thematic level; Dataset
level; and the Data/Map Series level.
4.16.2.3 Standard: Spatial Extent of Data
Data providers must ensure, as part of the metadata documentation process (See Chapter 10
of the Standards Manual), that section 5.1 is complete. The completion of section 5.1 of the
metadata standard will provide the user with additional information regarding spatial
completeness.
4.16.2.4 Content Completeness
Content completeness refers to the degree to which graphic information is meaningful.
Documenting content completeness is very hierarchical. If the data doesn't exist at one level,
documenting content completeness at higher levels must reflect loss of data at lower levels.
At the attribute level, content completeness refers to the number of fields to be provided for the
element along with the information found within those fields. For example, if the element called
ROAD has three associated attributes (Length, Surface type, and Condition) and all possible
entries are made for Length and Surface type, but Condition is only 10% complete, a content
completeness statement would be provided to reflect the overall completeness of the ROAD
element.
At the element level, content completeness is tied to spatial completeness . The element may be
spatially complete (i.e. it exists) however, it may be incomplete from a content perspective. At
which point the data provider may wish to reference the element's collection specifications.
Content completeness for theme data relates to both the existence of all elements within that
theme and the level of content completeness for each of the elements contained within the theme.
A theme may have half of the required elements contained within it, and have half of those
elements with associated attribution. The user of the data should be aware of such circumstances
prior to applying the data to their specific application.
For dataset level data content completeness refers to both the existence of all themes for the
dataset and the level of content completeness for each theme. The same rules hold true for data /
map series content completeness. Such completeness statements for the data/map series level
must reflect both the existence of all data sets for the series as well as the level of completeness
for each of the datasets within that series.
4.16.2.5 Standard: Content Completeness
Data providers must supply data completeness information pertaining to the data's content
details. These content completeness statements will reflect data at the Data/Map Series level;
Dataset level; Thematic level; Element level; and the Attribute level.
4.16.3 Guidelines: Spatial and Content Completeness Documentation
When providing a user with completeness information the data provider must keep in mind the
need for the user to make basic decisions as to whether or not to apply the data to a given
application. There are a number of ways in which a data provider might supply completeness
statements. The simplest, but least recommended method, is to simply state whether or not the
data is complete. Consistency of meaning is the basic problem. One persons incompleteness may
be another's completeness.
A second method of documentation is to expound on the brief complete / not complete statement.
Provide the user with as much detail regarding what makes the data complete and relate that to
the data being provided. Such documentation can however become verbose and caution should
be taken in providing textual completeness documentation.
Providing documentation in the form of statistics is a third possibility. Indicating to the users that
X out of Y are complete for specific data is one method of quantifying completeness.
Percentages of completeness is another.
Grapical depictions is a fourth method of documenting completeness. Such graphics may be the
form of a pie chart, a bar chart, or a map index. This is a particularly effective approach when
attempting to document data / map series spatial completeness.
Chapter 4 continued -
[Sections 4.0 to 4.3]
[Sections 4.4 to 4.7]
[Sections 4.8 to 4.9]
[Sections 4.10 to 4.11]
[Sections 4.17 to 4.19]
Chapter Four Table of Contents
|