GeoStor Home  
GeoStor Home Version 2.00.00 GeoStor
GeoStor Home GeoStor GeoSTor
The On-Line Spatial Data Infrastructure
GeoStor History In The News Contact Information GeoStor How to Download First Time User Available Data Current Status
Design and Architecture About GeoStor   Frequently Asked Questions Arkansas Soil Information System (ARK-SIS) Partners About GeoStor
About GeoStor GeoStor
 

Design and Architecture V 1.5

Executive Summary

GeoStor Enterprise Architecture

GeoStor Architectural Design

Server Side

The server-side of GeoSurf is primarily responsible for providing an interface between the client-side applet and the Oracle database. Consequently, it consists of several servlets, each supporting a specific element of the graphical user interface presented by the applet. Each of these servlets is accessed via an HTTP request and responds with an HTTP response. Java objects are streamed back by some of the servlets, but only within the constraints of HTTP. As with the client-side, the server-side of GeoSurf consists of both proprietary and third party software.

A J2EE compliant web server is needed in which to run all the servlets. Currently, we are running the reference implementation from Sun because it was easy (comes as part of the J2EE Development Kit) and free. Sun does not consider this implementation of production quality but we have had no problems with it so far. We've also successfully run GeoSurf under the Apache Web Server with the Jakarta package and are considering moving to that platform in the near future.

We are using the MapXtremeJava version 3.0 (MXJ) to provide visual selection of the geographic area of interest. Just like GeoSurf, MXJ has both server-side and client-side components: there is a servlet and there is a group of classes used in the applet to provide the graphical user interface.
We are also using MapInfo's MapMarkerJ product to provide geocoding of street addresses in the Geographic Filter. Again, there is a servlet and there is a group of classes used in the applet.

Several servlets were developed by CAST to support the GeoSurf interface:

GeoSurfServlet. This is the primary entrance to GeoSurf. It takes an HTTP GET request with parameters and generates an HTML stream to start the applet in the user's browser plug-in.

LayerListerServlet. The LayerLister provides a list of layers based on filter information provided by a client. In the applet, the LayerListerClient2 catches events from the Metadata Filter and the Geographic Filter. Each time a filter is changed the LayerListerClient2 asks the LayerListerServlet to perform a search on the database. The LayerListerClient2 passes both the metadata filter and the geographic filter to the LayerListerServlet each time. A search is done on both criteria each time.

Searching for layers that meet the search criteria is done is two steps, first the list of layers is narrowed by filtering out data that doesn't meet the metadata criteria second layers are filtered out if they do not fall in specified geographic region. Filtering by geographic region is CPU intensive, for this reason it is important to do the Metadata Filter first so as to minimize the list of layers on which the Geographic Filter is done. To further speed up geographic filtering the metadata contains a minimum distance figure. If the side of a box with the same area as the search area of the Geographic Filter is larger then the minimum distance figure then data is assumed to be present and the no geographic search is done for that layer.

The resulting layer names and the layer titles are packaged and sent to the LayerListerClient2.

MetadataServlet. The MetadataServlet provides layer metadata. In the applet, the MetadataClient sends a layer name to the MetadataServlet. The MetadataServlet then queries the metadata table for the entire row associated with the table name. The metadata row is packaged and returned to the MetadataClient.

PolyPanelServlet. Serves the Existing Polygon panel of the Geographic Filter. Queries a specialized metadata table in the Oracle database for the available clip-by polygons, bundles this information up and streams it to the applet as a serialized object.

DownloadServlet. Takes the parameters passed to it in the HTTP request and uses them to generate the requested data layer. Depending on whether the layer requested is vector or raster, the servlet calls the appropriate Perl script to do the actual data manipulations. Each Perl script responds with a zip file containing the requested data. The servlet moves the zip file to a location in the html doc tree and emails the user with this URL. Details of data translation are discussed below.

ReportServlet. The ReportServlet provides a GeoSurf activity record. An HTML page collects the date range of interest and sends the data to the ReportServlet. The ReportServlet queries the activity log in the database for the date range and returns an HTML page of the activity.

AnalysisServlet. Provides a mechanism for estimating the final size of a data layer, given the specified geographic filter. This is used by the Download Wizard to limit the size of each download.

All Java-Oracle interactions use JDBC calls with drivers obtained from Oracle.

Data Translation

We use FME and PCI to handle the actual data translation and conversion. These are accessed by the Download servlet via Perl scripts developed by CAST.

Vector Data

All raw vector data is stored in the Oracle database. To process this data, the DownloadServlet calls a Perl script, passing the various data format and projection parameters. The Perl script creates necessary transformation files for FME then calls the command-line version of FME with these files. FME performs its magic and returns control back to the Perl script that zips the results and returns the zip file name to the servlet. The servlet then moves the zip file to a location in the html doc tree and emails the user with the URL of this location.

Raster Data

The sequence for raster data is essentially the same except that, since Oracle does not yet support raster formats, the raster data is not stored entirely in the database as is the vector data. The database does contain the metadata for the database but not the the actual raster image. Instead the database contains a vector footprint of the raster data. This footprint is used to geographically filter the data. The actual raster is stored outside the database in one or more GeoTIFF files.

A large raster layer may be cut into smaller sections for easier handling. The sections are reassembled during the final production of the output image. All the files for a single raster layer are store in a directory along with a contents text file. The name of the directory is the layer name. The contents file contains a list of all the files that make up the raster layer. All the layer directories reside in a single specific directory, either physically or through a symbolic link.

Using parameters passed to it by the DownloadServlet, the Perl script first creates an EASI script (PCI Geomatic's scripting language) based on an EASI script template. This EASI script is saved in the directory where the layer image will be created. The Perl script then calls the EASI script which it has created. The EASI script first reads the contents file in the directory with the same name as the layer. It searches through each file to see if there is an overlap between the file and the boundary box. If a file does overlap, the over lapping section is cut out and copied to a working directory. When all the sections have been cut out they are merged. The merged file is reprojected to the requested projection. At this point if the source files have a psuedo color table, it is copied to the reprojected file. Finally, a TIFF file is created, and control returns to the Perl script. The Perl script then zips the results and returns the name of the zip file to the servlet. The servlet moves the zip file to a location in the html doc tree and emails the user with the URL of this location.

Caveat

The Perl scripts are a bit of a sore thumb in terms of maintenance and deployment. We are currently looking at FMEObjects, a new Java product from Safe Software, to replace the vector Perl script. When Oracle implements raster functionality, we'll look into replacing the raster Perl script as well.

II. GeoStor Oracle Database Configuration (return to index)

GeoStor Oracle Database Memory Configuration

System Global Area (SGA)
General rules to follow in setting up the Oracle SGA:
Do not create an SGA that is larger than two-thirds of the size of your database server's physical RAM.
Avoid excessive paging, this can result from an SGA that is too large and may be competing with other application for memory.
SWAP space needs to be two to four times the size of your physical RAM.

Configuration of the GeoStor SGA:

SGA Parameter ..... Size ................ Initialization Parameter
----------------------------------------------------------
Shared Pool ......... 75MB ............. SHARED_POOL_SIZE=78643200
Buffer Cache ....... 1624MB .......... DB_BLOCK_BUFFERS=207885
Large Pool ........... 800MB ........... LARGE_POOL_SIZE=614400
Java Pool ............. 20MB ............. JAVA_POOL_SIZE=20971520

All Oracle memory parameters, plus several other critical configuration parameters, are defined in the initialization file. The file is read at database startup. Several of these parameters will be discussed in the following section.

SHARED_POOL_SIZE: Component of the Oracle SGA that holds both the data dictionary cache and the library cache. The recommended size range is 55MB - 200MB based on activity of the geodatabase.

DB_BLOCK_BUFFERS: Component used with the DB_BLOCK_SIZE to determine the Oracle SGA buffer cache. The buffer cache stores the most recently used data blocks. For optimum performance, increase the size of the buffer cache without causing SGA paging. Oracle recommends the SGA not be larger the two-thirds of RAM. Oracle recommends that the buffer cache hit ratio be at least 95 percent. In other words, 95 percent of the time the data blocks read are in the data cache and not read from disk.
SGA memory max = physical RAM * 2/3
Buffer cache max = (SGA memory max - (shared_pool_size + log_buffers)) * 0.9
Db_block_buffers = buffer cache max / db_block_size

Other applications' memory requirements must be taken into consideration before setting the buffer cache to two-thirds of the server's physical RAM. GeoStor uses only one-eighth of the physical RAM due to the memory requirements of the raster software PCI.

DB_BLOCK_SIZE: Data Blocks are the Oracle atomic unit of data transfer. GeoStor uses 8 KB block size. Geodatabase with mostly linear or area features may deliver higher overall performance with 16 KB block size.

SORT_AREA_SIZE: This is memory set aside for sorting. The minimun sort_area_size should be 512 KB. GeoStor's sort_area_size is 20 MB. The larger the sort_area_size the less likely I/O to the temporary tablespace is needed.

LARGE_POOL_SIZE: The large pool allocation heap is used in multi-threaded server systems for session memory.

LOG_BUFFER: Component of the Oracle SGA that holds uncommitted changes to the database. Oracle recommends this parameter be set to 500KB multiplied by number of CPUs on the database server. GeoStor setting is LOG_BUFFER=10485760

CONTROL_FILES: Oracle recommends at least two control files on separate disk. GeoStor is configured using three control files on different disk.

ROLLBACK_SEGMENTS: GeoStor uses 28 rollback segments.

GeoStor Oracle Database Storage Configuration

Disk I/O contention constitutes the most challenging performance bottleneck. There is no complete solution for I/O contention but one can minimize the problem by reducing the disk I/O and balancing the I/O load across the file system. GeoStor minimizes disk I/O by creating a relatively large data buffer cache, therefore reducing disk read/writes. GeoStor distributes the data file load over multiple disk partitions spanning two RAID controllers and six RAID subsystems.

GeoStor logical and physical data distribution.

Oracle divides data into logical storage units (tablespaces) which are uses to logically group data together. The tablespaces are made up of one or more physical data files. The GeoStor geodata is composed of four tablespaces. Each tablespace is composed of one or more data files distributed over multiple disk partitions. The following table describes the GeoStor geodatabase distibution.

Tablespace

Size

Initial Size

Next Size

Increment Size

Data files

Size

geodata

12 GB

256 KB

256 KB

0

/db/d03…/geodata01.dbf

2 GB

         

/db/d04…/geodata02.dbf

2 GB

         

/db/d03…/geodata03.dbf

2 GB

         

/db/d04…/geodata04.dbf

2 GB

         

/db/d03…/geodata05.dbf

2 GB

         

/db/d04…/geodata06.dbf

2 GB

Geodata_indx

1 GB

32 KB

32 KB

0

/db/d05…/geodata_indx01.dbf

1 GB

Geodata_sindx

8 GB

16 KB

16 KB

0

/db/d05…/geodata_sindx01.dbf

2 GB

         

/db/d06…/geodata_sindx02.dbf

2 GB

         

/db/d05…/geodata_sindx03.dbf

2 GB

         

/db/d06…/geodata_sindx04.dbf

2 GB

Geodata_temp

4 GB

16 KB

16 KB

0

/db/d06…/geodata_temp01.dbf

2 GB

         

/db/d05…/geodata_sindx02.dbf

2 GB

The GeoStor tablespaces provide functional grouping of the data.

Geodata_temp: Used to store the spatial data as it is loaded into the Oracle database. The data in geodata_temp is validated here before being moved into the permanent spatial warehouse.

Geodata: The permanent storage location of spatial data. As data is moved to this tablespace the table storage parameters are sized so that the initial extent contains the complete data set for that table which reduces data fragmentation.

Geodata_indx: The storage location of the primary key index and any other attribute indexes.

Geodata_sindx: The storage location of the spatial index.

Oracle8i Database Security Summary

Oracle8i provides and supports scalable and flexible user authentication, audit, encryption, discretionary access control and a host of security features geared toward robust Internet computing. Documentation can be found at http://technet.oracle.com/deploy/security/oracle8i/htdocs/overview.htm.

Server Password-based Authentication
Oracle Database server-side architecture uses password-based schemes. Oracle8i provides built-in, password management facilities to enable administrators to:
Enforce minimal password length.
Ensure password complexity.
Disallow passwords that are easily guessed words.
Lock accounts automatically after a certain number of incorrect password entries.

Certificate-based Authentication
Oracle Advanced Security, an option to Oracle8i offers enhanced PKI-based single sign-on to Oracle8i through the use of interoperable X.509 (version 3) certificates for authentication over Secure Sockets Layer (SSL), the standard for Internet authentication. In addition to strong user authentication, SSL also provides network data confidentiality and data integrity for multiple types of connections: LDAP (Lightweight Directory Access Protocol), IIOP (Internet Intra-ORB Protocol), and Net8.

The primary component of the PKI infrastructure offered by Oracle is the Oracle Wallet Manager, which provides secure management of PKI-based user credentials. Once users have securely opened their wallets, they can then connect to multiple Oracle8i servers over SSL, without providing additional passwords. Such a technology provides the benefit of strong authentication as well as single sign-on.

Host-based Authentication
Oracle8i also allows users to be authenticated by the underlying host, or operating system mechanisms, thereby consolidating username and password information.

Oracle8i Third Party Authentication
Oracle Advanced Security, an option to Oracle8i, supports multiple third party authentication technologies, such as Kerberos, DCE, smart cards and biometric authentication (Identix) and RADIUS.

N-tier Authentication
For applications and systems that rely on a middle tier, Oracle8i offers n-tier authentication, that is - "lightweight session" creation via the Oracle Call Interface (OCI), so that applications can have multiple user sessions within a single database server session. These "lightweight sessions" allow each user to be authenticated by a database password, without the overhead of a separate database connection, as well as preserving the identity of the real user through the middle tier.

Audit
Oracle8i provides a number of features and functions to enable accountability of actions taken by users of the database. Oracle8i does this by providing accounting and auditing features which are designed to be as granular and flexible as possible to ensure that exactly what needs to be accounted and audited, as dictated by the application or system security policy, is recorded, but nothing more

Encryption
Oracle8i offers server-based encryption (and decryption) via PL/SQL packages using industry-standard Data Encryption Standard (DES) in exportable keylengths.

Access Control
Oracle8i provides a strong set of access control security mechanisms through privileges. Oracle8i enforces the Principle of Least Privilege - that is, granting only those privileges to a user, which allows him to perform his job functions, but no more.


III. GeoStor Spatial Database Model (return to index)

Schema (user) in GeoStor

GeoStor uses three schemas; gtemp, gmeta, gdata. Gtemp is a temporary holding area. All validation checks are performed using the schema before data is moved to the production environment. Gmeta holds the metadata for the spatial data layers and logs of the download activity. Gdata holds the spatial data and indexes.

Spatial Data Table Naming Scheme

Oracle limits the data tables to 32 characters. Each table and index must have a unique name. Each table is named using the following format "DESCRIPTION_SOURCE_DATE_SCALE". The scale is usually not associated the vector data and therefore not used in the name. Example table names are "ALL_ROADS_AHTD", "VOTING_DIST_TIG2000". These names are generally for internal use and provide a reasonably sized name for the data files on the end uses computer when downloaded. The full title of the spatial layer is provided in the metadata.

Table Structure of the Spatial Data

ID number

Attribute_1 varchar2(10)

Attribute_2 number

…

Geometry SDO_GEOMETRY

All tables contain an ID field of type number. This field contains a unique numeric value. This field is the primary key unique index. The field can be used as a relational key to any ancillary tables associated with that geometry. All tables contain one GEOMETRY field of type SDO_GEOMETRY. This field contains the spatial element. The other fields are attribute information of any data type.

Metadata Table Structure

The metadata table drives the GeoStor download application. For every spatial data table loaded in GeoStor an entry in the metadata table is required to make that data available for download. The metadata table is a flattened structure of the minimum required FGDC metadata information. Several additional internal management fields have also been added to this table. The metadata table structure follows, the fields added for GeoStor have short descriptions, the others are minimum FGDC information.

Field Name

Type

Description

DB_OBJECT_NAME

VARCHAR2(32)

Spatial data table name/Raster data directory name

DB_RELATE_NAME

VARCHAR2(32)

Ancillary tables

DB_SCHEMA_NAME

VARCHAR2(32)

Schema if other the standard

DATA_ID

NUMBER

Primary key

DATE_TIME

DATE

 

TITLE

VARCHAR2(120)

 

ABSTRACT

VARCHAR2(4000)

 

PURPOSE

VARCHAR2(4000)

 

PROJECTION

VARCHAR2(60)

 

DATUM

VARCHAR2(60)

 

SCALE

NUMBER

 

RESOLUTION

VARCHAR2(30)

 

RES_UNIT

VARCHAR2(20)

 

WEST

NUMBER

 

EAST

NUMBER

 

NORTH

NUMBER

 

SOUTH

NUMBER

 

DATA_TYPE

VARCHAR2(12)

 

SOURCE_DATE

DATE

 

BEG_DATE

DATE

 

END_DATE

DATE

 

PUB_DATE

DATE

 

CONSTRAINTS

VARCHAR2(400)

 

FILE_NAME

VARCHAR2(32)

 

DIR_LOCAT

VARCHAR2(100)

 

GRAPHIC_FILE

VARCHAR2(100)

 

GRAPHIC_FORMAT

VARCHAR2(10)

 

CATEGORY

VARCHAR2(500)

 

THEME_KEY

VARCHAR2(1000)

Multiple comma delimited

PLACE_KEY

VARCHAR2(400)

Multiple comma delimited

PROJECT_NAME

VARCHAR2(60)

 

DATA_FORMAT

VARCHAR2(32)

 

FILE_SIZE_MB

NUMBER

 

DISTRIB

VARCHAR2(60)

 

DISTRIB_LIST

VARCHAR2(100)

 

CONTACT

VARCHAR2(210)

 

DATA_CREATOR

VARCHAR2(210)

 

METADATA_FILE_NAME

VARCHAR2(120)

 

STATEWIDE

NUMBER(1)

Statewide flag, not spatially searched

SEARCH_SIZE

NUMBER(5)

Spatial search if search area larger than value 

COVER_TABLE

VARCHAR2(32)

 

PROCESS_DESC

VARCHAR2(4000)

 

MOVED_TO_CLEARH

CHAR(1)

Flag for FGDC Clearinghouse entry

DATE_TO_CLEARH

DATE

Date moved to FGDC Clearinghouse

DATA_DENSITY

NUMBER

Value to estimate raster file size of requested area

Spatial Data Index

All spatial data tables are indexed using the R-tree indexing method. The Spatial indexes are stored in the geodata_sindx tablesspace. The index names are in the following format "IS_TABLE_NAME". The index name must be 20 characters or less because Oracle Spatial appends 12 characters to this name to the associated index tables.

IV. Spatial Data Loading and Tuning Process (return to index)

 

1. Data collection.
2. Prepare merged (seamless) version of the layer.
3. Safe Software Inc. FME is used to translate the data to the Oracle Spatial temporary schema (gtemp).
4. Validation checks are run on the layer. These include running the Oracle Spatial function "SDO_GEOM.VALIDATE_LAYER" and visually checked with the FME Universal Viewer.
5. The size of the data layer is determined by summing the byte-size of the extents used by the spatial table.
6. New table is created in the gdata schema. Primary key created in ID field. Storage parameters are tuned for the specific table (initial extent sized).
7. Entry made in the USER_SDO_GEOM_METADATA table.
8. Spatial Index created (R-Tree).
9. Layer information is added to the metadata table.

V. Implementing OpenGIS Standards and Interoperability Compliance (return to index)

The GeoStor system is designed to be fully compliant with OpenGIS Consortium and ISO standards for interoperability. OGC is an international industry consortium of more than 220 companies, government agencies and universities whose goal is to develop publicly available geoprocessing specifications. Open interfaces and protocols defined by OpenGIS Specifications support interoperable solutions that geo-enable the Web, wireless and location-based services, and mainstream IT. Implementation of these interfaces and protocols in turn allow technology developers to make complex spatial information and services available to all kinds of applications.

OGC Core Services include interfaces that are typically required regardless of application area or business domain. The OpenGIS Coordinate Transformation Specification, the OpenGIS Catalog Specification, and the Draft OpenGIS Services Registry Specification are examples of Core Services. OpenGIS Web Mapping Services is a family of specifications that enable servers to dynamically query, access, process, and combine different types of spatial information over the web with OpenGIS Specification conformant servers developed by other companies and organizations. To date, OGC has developed three Web Mapping Service specifications: OpenGIS Web Map Server Specification (WMS 1.1.1), OpenGIS Web Feature Server Specification (WFS), and a OpenGIS Web Coverage Server Specification (WCS, in approval process). OpenGIS Geography Markup Language (GML 2.1) is an XML 1 encoding for the transport and storage of geographic information, including both the geometry and properties of geographic features.

The Federal Geographic Data Committee, a central player in efforts to promote open standards, coordinates the development of the National Spatial Data Infrastructure (NSDI). The NSDI encompasses policies, standards, and procedures for organizations to cooperatively produce and share geographic data. The 17 federal agencies that make up the FGDC are developing the NSDI in cooperation with organizations from state, local and tribal governments, the academic community, and the private sector. The NSDI is defined as the technology, policies, criteria, standards and people necessary to promote geospatial data sharing throughout all levels of government, the private and non-profit sectors, and academia. It provides a set of practices and relationships that facilitates data sharing and use among data producers and consumers. It addresses issues such as accessing, sharing and using geographic data while eliminating redundant and costly data production. One aspect of enabling data discovery is FGDC-funded efforts to establish and support a national clearinghouse network of searchable geospatial metadata. GeoStor data are documented in compliance with the FGDC Content Standard for Digital Geospatial Metadata. This documentation is stored in the GeoStor database and is accessible online via the Arkansas GeoLibrary, a node of the NSDI Clearinghouse.

VI. Processing Protocols for GeoStor Data Development (return to index)

Raster Data
Raster data utilized by GeoStor is stored in GeoTIFF format in a UNIX directory structure organized by layer. A layer may be composed of a single TIFF file or multiple, edge-matched TIFF files (tiles). All TIFF files that make up a single layer are stored in a single directory with an index file in text format that stores each TIFF filename. The GeoTIFF format contains projection information read by PCI. Files are stored without an associated world file (.tfw) because world files interfere with PCI's ability to read projection information from the GeoTIFF.

TIFF files in layers requiring multiple files should be created in such a manner that they match exactly along adjacent edges with no overlap. Files should be scrutinized carefully to insure that a continuous grid of cells is maintained across tiles. There should be no break in grid continuity and no overlap in coverage when two adjacent tiles are visually inspected at a small display scale.

All raster layers in GeoStor are stored in a common projection and datum, with one exception. All layers except the 1:24,000 USGS Digital Raster Graphics layer are stored in Universal Transverse Mercator Zone 15, North American Datum of 1983 because it is the most commonly used coordinate system in Arkansas, thereby reducing the amount of processing time used by the system for reprojection. The 1:24,000 USGS DRG layer is stored in UTM Zone 15, North American Datum of 1927 to match the native coordinate system of the source data because of issues related to the size of the statewide mosaic and prior software limitations.

Two levels of metadata are maintained for mosaicked raster data. One set of metadata is maintained for the combined mosaic, and another set of metadata is maintained based on the vector boundaries of the files included in the mosaic. For example, an overview set of metadata is developed for all DRGs in one layer, and another set is developed based on the quad boundaries, which includes source date, contour interval, date published, and other pertinent attributes. These source attributes are stored in the feature attribute table of the vector quad boundaries.

Display Imagery - Color Composites

Several products were created with the idea that end users will benefit from imagery that does not require processing to produce visually appealing results for display in geospatial systems. Using PCIWorks V6.3.0, CAST staff created color composites from Landsat Thematic Mapper imagery mosaics using standard band combinations to produce true color (RGB 3,2,1), color infrared (RGB 4,3,2), pseudo-color (RGB 5,4,2), and Tasseled Cap (wetness, greenness, brightness) composites for multiple seasons in the same year. Since this data was not freely distributable in its original form, three-band composites were compressed into single-band (8 bit) GeoTIFF files so that the original data values could not be extracted from the final products.

Display Imagery - DOQQs

Because of the storage space ramifications involved in processing and creating mosaics of USGS Digital Orthophoto Quarter Quadrangles (each B/W DOQQ is approximately 50Mb wile each color DOQQ is 150 MB), staff investigated possible alternative software solutions and chose ER Mapper because of its unique image algorithm storage method. ER Mapper allows the user to create an algorithm (< 5 Mb) that specifies how files are to be treated in the mosaic and does not require the creation of intermediate processing files. The mosaic process is completed through the use of an Image Display and Mosaic wizard that has the ability to mosaic files with different resolutions, and contrast balancing is completed using an Image Balancing wizard which attempts to align the histograms of each individual DOQQ to a common histogram.

Upon further investigation, it was determined that the color-matching routines used in ER Mapper were unsatisfactory for use with black and white photography covering large areas. Instead of color-matching, the contrast-balanced mosaic (which now included data values ranging from -150 to 470) was forced into a range of 0 to 255 using a simple linear transformation with a 99% clip of the original data values (top-most and bottom-most ½ percent written to 255 and 0, respectively).

To facilitate processing of the final mosaic (183 Gb) by GeoStor, the mosaic needed to be cut into a series of equal-sized tiles. A 30 X 30 grid was overlain on the mosaic and tiles were cut by grid boundaries. These tiles were then converted to GeoTIFF format, with each file using approximately 190 Mb of storage space. When the user selects their area of interest the individual tiles (where needed) are joined and the area clipped from the seamless data set. The user sees the output product as a seamless product.

Work is now underway to provide the DOQQs in their original, un-modified formats and in compressed formats. In the un-modified form any DOQQ falling within the search area will be provided to the user. The operation will essentially involve providing the FTP location for those selected. In the compressed approach the user will select the area as before but will also indicate the desired amount of compression (plans are to provide 1:5, 1:10, 1:20 and perhaps, 1:50) and all the normal processes will occur. When the seamless product is created, however, it will then be passed to the LizardTech MrSID compression engine prior to delivery to the user.

Analysis Imagery - Landsat TM

Landsat Thematic Mapper imagery that is freely distributable is georeferenced and converted directly to TIFF format without being included in a mosaic or being altered in any other way. This allows the end user to make use of the original data for classification purposes. Each band from each scene is offered for download separately to maximize the efficiency of Internet bandwidth usage.

Display Thematic Rasters - DRGs

USGS Digital Raster Graphics are available at 1:250,000 scale, 1:100,000 scale, and 1:24,000 scale. A mosaic was created for each scale using PCIWorks V6.3.0, with the mosaic area for each file defined by a vector boundary for the quadrangle. By defining the mosaic area in this manner, collar information (scalebar, north arrow, etc.) is removed during the mosaic process but map neatlines or white areas may be included at the edges. To minimize the appearance of these 'seams' in the mosaic, a 1 X 7 pixel mode filter was applied along each north-south oriented quadrangle boundary, and a 7 X 1 pixel mode filter was applied along each east-west oriented quadrangle boundary.

Because the total file sizes of the 1:250,000 and 1:100,000 scale mosaics were a manageable size (< 2 Gb), these files were kept intact and converted to a single TIFF file for each scale. The 1:24,000 scale mosaic, however, would be much too large (approximately 26 Gb) to be handled in a timely and effective manner by GeoStor and was therefore created in a series of tiles based on 1:100,000 scale quadrangle boundaries. These tiles required the same edge-matching routine (1 X 7 and 7 X 1) as the individual files in the mosaics. Each of these tiles were still a bit too large to process through GeoStor in a reasonable amount of time, so they were split into 16 tiles per 1:100,000 quadrangle and converted to TIFF format.

GRID Data

While the GeoTIFF is an effective format for many raster data types, it is not the format of choice for data where there is a value range larger than 255 values. As a result, a new data storage type has been added to GeoStor for raster data with larger dynamic ranges. A common example is elevation data. GRID raster data is stored in PCI PIX file form. Multiple raster compression options will be added to GeoStor in early Q2 2002. These will include ECW and MrSID. Users will be able to select the compression level (5,10,20 etc.) and the data selected will be compressed and distributed. Users will not be permitted to apply these (lossy) compression methods to GRID data but will for imagery. In other aspects, however, image and GRID data are transparently managed.

Vector Data

Vector features in GeoStor are stored as Oracle Spatial Objects in an Oracle database. Features are loaded into the database from multiple file formats using Feature Manipulation Engine (FME) from Safe Software, Inc. All vector features are stored in Universal Transverse Mercator Zone 15, North American Datum of 1983 because it is the most commonly used coordinate system in Arkansas, thereby reducing the amount of processing time used by the system for reprojection. Coordinate system conversions and datum transformations for vector data are handled during the database loading process by FME.

Tile- or county-based vector layers are combined using a merge or append operation. Topology and attribute consistency must be checked across adjacent tile or county boundaries prior to appending. There should be no overlapping polygons or slivers along tile boundaries. All files in a layer should have the same attribute columns with the same properties so that no attributes are lost during the append operation.

Vector datasets that contain multiple feature categories are split into layers by extracting features by category after the appending of the source files. For example, State Highways, U.S. Highways, and Interstates are all features extracted from the same source data, USGS Transportation DLGs. Multi-feature datasets are also offered in a combined manner, such as All Roads from USGS DLGs.

In general, features are limited to the extent of the state boundary. Exceptions to this rule are tiled layers with feature extents that overlap the state boundary. Layers in this category contain features beyond the state boundary because there is no reason not to include these features as they will be processed with the rest of the tile anyway. Other exceptions include layers whose features extend beyond the state boundary but need to be preserved as whole features (e.g. hydrologic basins). In this case, all features contained by or intersecting the state boundary are included in the final layer.

Two levels of metadata are maintained for appended vector data. One set of metadata is maintained for the overall combined layer, and another set of metadata is maintained based on the vector boundaries of the files included in the combined layer. For example, an overview set of metadata is developed for all DLGs in one layer, and another set is developed based on the quad boundaries which includes source date, publishing date, original filename, and other pertinent attributes. These source attributes are stored in the feature attribute table of the vector quad boundaries. In some cases, metadata attributes such as source filename are maintained with each feature in a vector layer.

Tiled Vectors - DLGs

USGS Digital Line Graphs are created by digitizing features from paper maps and other sources, and therefore have problems with connectivity along tile boundaries. In order to maximize the usefulness of the data while minimizing the introduction of error during processing, features crossing tile boundaries were snapped together when nodes were within a specified tolerance value. The tolerance value was determined through a visual inspection of features along tile boundaries throughout the state. For 1:100,000 scale DLGs, the tolerance value used for snapping nodes along tile boundaries was 5 meters. The snapping was accomplished using the 'snapcover' command in ArcInfo 7.1.1.

Prior to appending the tiles, the feature attribute tables were adjusted to a consistent structure to insure proper attribution. Additional columns containing file source information were added to each feature attribute table to allow the end user to determine the source file and date of each individual feature. After snapping, the tiles were appended into a single file and topology was rebuilt. Slivers and overlaps in polygon layers were identified and eliminated. Features were then extracted by feature class (e.g. State Highways) and written to individual files. When dealing with polygon features, the tile boundaries were left intact within polygon features to allow separate source information attributes to be retained for each feature part.

Features in the final layer include those that fall outside the extent of the state boundary but are contained in a tile that intersects the state boundary.

County-based Vectors - TIGER/Line files

TIGER/Line data was extracted from the original download files from the U.S. Census Bureau using TGR2SHP from GISTools (www.gistools.com). At the time, this software had no restrictions on the distribution of files resulting from its use in processing. Recently, GISTools introduced distribution restrictions that will not allow the resulting data to be distributed freely. Other TIGER/Line data translators are available for download that will allow distribution of the resulting data.

TIGER/Line files are created using a methodology that insures correct topological relationships along county boundaries, so no snapping is necessary when processing the translated files. TGR2SHP creates files of individual features for each county with a consistent naming structure and feature attribute table structure. The individual county files were appended by feature into a single statewide file using the 'merge' functionality contained in ArcView 3.2a.

County-based Vectors - FEMA Q3 Flood data

FEMA Q3 Flood data is created by digitizing county-based FEMA flood zone maps, and therefore has problems with connectivity and attribution across county boundaries. To alleviate the problems associated with overlapping polygons, the FEMA county files were clipped using the TIGER/Line County boundary file for 1990. While this left slivers of 'no data' between polygons along the county borders, it was acceptable because of the topological problems it fixed. FEMA county files were then merged into a single statewide file for distribution.

State-overlapping Vectors - Hydrologic Basins

Hydrologic basins are good examples of polygon features that extend beyond the geographic extent of the state boundary and should be maintained in their entirety. Hydrologic unit data downloaded from Natural Resources Conservation Service was extracted for GeoStor by selecting all hydrologic units contained by, intersecting, or within 5 miles of the state boundary.

Cell point features - CAD data

Point features are often represented in CAD data files (AutoCAD or Microstation) with cells or graphics designed to give the data user visual clues about the nature of the features. Cells can be thought of as being very similar to marker symbols used to symbolize points in more traditional geographic information systems. Unfortunately, these cells cannot be handled as features in most GIS packages. Instead, a cell is represented by a point derived from the origin point of the cell. Features are selected by cell type or cell name and the origin points of the cells are extracted. These points are the features loaded into GeoStor for distribution.

 

 

VII. GeoStor Hardware Configuration (return to index)

The hardware used in the GeoStor system consists of the following major components: a Sun Enterprise 4500 server, a Sun StorEdge L3500 Tape Library and a Sun StorEdge A5200 RAID disk system.

The Enterprise 4500 server has four power/cooling modules, four CPU/Memory boards, eight 400Mhz/8MB-cache UltraSPARC CPU modules, 16GB RAM, two Sbus I/O boards, one Sbus Graphics I/O board, two FC-AL 100MB/sec Dual Channel Sbus host adapters with 1 GBIC modules on each, four SBus Ultra Differential Fast/Wide Intelligent SCSI Host Adapters, two Netra st D130 2x18GB disk subsystems, one Sbus Gigabit Ethernet Adapter, one Creator3D Series Graphics Adapter and one DVD-ROM drive.

Of these components, the following were included in the base Enterprise 4500 server package: four power/cooling modules, four CPU/Memory boards, eight 400Mhz/8MB-cache UltraSPARC CPU modules, eight 1GB memory options, two Sbus I/O boards and one DVD-ROM drive. Memory was increased to 16GB by adding four 2GB (8x256MB) memory modules. An Sbus Graphics I/O board was added for additional I/O connections and to enable the addition of the Creator3D Series Graphics Adapter. The two Netra st D130 2x18GB disk subsystems were added to act as mirrored boot devices; they are mounted inside the Enterprise 4500 chassis and connected to the server via two single-ended SCSI connections. The four Sbus Ultra Differential Fast/Wide Intelligent SCSI Host Adapters were added to connect the Enterprise 4500 server to the StorEdge L3500 Tape Library. The two FC-AL 100MB/sec Dual Channel Sbus host adapters with 1 GBIC modules on each were added to connect the Enterprise 4500 server to the StorEdge A5200 RAID disk system. Note that each of the three I/O boards includes a FC-AL port preinstalled. The Sbus Gigabit Ethernet Adapter was added to enable high bandwidth connections to the network.

The Sun StorEdge L3500 is a Tape Library with a native capacity of 3.5TB. It is a robotic tape library system with 7 DLT7000 tape drives and 100 tape slots. The base StorEdge L3500 package included two tape drives; five were added. Connection to the server is made with four Ultra Differential Fast/Wide SCSI channels.

The Sun StorEdge A5200 is a 4800GB RAID disk system. It has six 800GB arrays. Each array is dual-ported and has twenty-two 36GB 10000RPM low-profile FC-AL drives. The system has four controller hubs. Each hub has four GBIC ports; three of these ports connect to the arrays and one connects to the server. Thus, four FC-AL connections from the server are used to connect to each of the four hubs and each hub connects to three of the six arrays. Therefore, each array is served by redundant connections.

The server runs the Solaris 8 Operating System. Additionally, the Veritas Volume Manager and Veritas Filesystem software are used to manage the data on the disks. The Sun Netbackup software is used to perform backups to the Tape Library.

The software and data are placed on the disks as follows. The Operating System and core software components (Veritas Volume Manager, Veritas Filesystem and Sun Netbackup in particular) are mirrored across two Netra disk subsystems using the Veritas Volume Manager. Each Netra subsystem has two 18GB disks. On the first disk, the root and swap filesystems are installed. On the other disk, the /opt, /usr, /var and an additional software partition are installed. Note that the Netra systems were used since the 4500 server has no internal disk and it would probably be unwise to place the boot partitions on a complex RAID system such as the A5200. All other data and software partitions are placed on the StorEdge A5200 RAID system using the Veritas Volume Manager and Veritas Filesystem. For performance and reliability, all partitions placed on the RAID system are RAID-5 volumes that have six data columns spread across all six of the array subsystems in the StorEdge A5200.

To better illustrate the I/O connections on the Enterprise 4500 server, a description referencing Figure Fig-4500-IO follows. Box #1 is integrated into the 4500 backplane.

 

· Slot #1A is blank.
· Slots #1B and #1D are unused serial ports.
· Slot #1C is connected to the system keyboard.
· Boxes #2-#4 are I/O boards; box #2 is the Graphics I/O board and boxes #3 and #4 are two I/O boards. All three I/O boards include a MII port (A), a twisted-pair 10/100-network port (B), a Single-ended SCSI port (C), a FC-AL 100MB/sec dual channel port (D) and three empty Sbus slots (E, F & G).
· The Creator3D graphics adapter was added at slot #2F.
· The Gigabit Ethernet adapter was added at slot #4G.
· The four Differential SCSI adapters were added at slots #2E, #3E, #3G and #4E; they are used to connect to the L3500 Tape Library.
· Two FC-AL adapters were added at slots #3F and #4F. The FC-AL adapters at slots #2D, #3D, #4D and #4F are used to connect to the A5200 RAID system and the FC-AL adapter at slot #3F is installed as a spare.
· The single-ended SCSI ports at slots #3C and #4C are used to connect to the two Netra systems. The single-ended SCSI port at slot #2C is available as a spare.

Introduction | Client Side | Server Side

GeoStor Design and Operational Considerations
April 12, 2002

By
Douglas Meredith, Deborah Harmon, John Wilson, Robert Harris, James Sullins
and W. Fredrick Limp

Center for Advanced Spatial Technologies
University of Arkansas, Fayetteville

Begin GeoSurf Applet Begin FTP download