Assessing the Intermediate Database
Please note: This post originally appeared on Extol.com (EXTOL has been acquired by Cleo).
The types of databases employed can often determine whether a secondary database is necessary. This secondary database is typically referred to as a staging or intermediate database because it resides outside the base application.
Should the company Enterprise Resource Planning (ERP) application support all required business transactions – both inbound and outbound with customers – then the need to have an intermediate database is lessened. However, the company ERP might provide support for “core” business transactions but might be limited for “extended” business transactions. This creates a business problem – where to store the extended business transaction data.
Example 1: The ability for a common ERP to process purchase orders and generate invoices and shipping notices is expected as this represents the minimum data set required to do business. However, as the customer base expands and more business transactions require automated electronic integration (removal of manual processing), the need for this intermediate/staging area is exposed.
Example 2: The customer might require that purchase orders are acknowledged or invoices paid electronically. If the company ERP does not support this data or these processes then the intermediate database must be developed … there must be a place where the data is stored to where the ERP can have access to that data as needed.
Generally, an integration tool (translator or data transformation product) would be implemented to introduce data to the intermediate files. At that point, the company would develop custom programs to extract data from the intermediate files and introduce data to the ERP. More advanced integration tools not only possess the ability to move data from a standard business format to the intermediate database but also from the intermediate database to the company ERP. By developing a “double-hop” approach, data can be moved from EDI to intermediate database to ERP in a single process, with the second hop being a “database to database” data translation.
When building an intermediate database, determining the type (structure/format) to be developed is a combination of access requirements and preference. In an effort to make an informed decision on which to deploy, the application developer should be familiar with the types of databases as well as the positive and negative impacts of each.
We will consider two general types of databases as the basis for the intermediate application…”Flat File” and “(Relational) Database.”
“Flat File” databases consist of formats of single or multiple record types, and can either be fixed-length (all fields expanded to full length) or variable length/delimited (all fields have a maximum length but only display to the length of the data and terminated by a separation character).
The simplest form of flat file is a standard text file and consists of a single record definition. The record or “row” may occur from one to many times. Every row consists of a horizontal list of fields and the same field definition can be applied to every row in the file. More complex forms of flat files might consist of multiple record definitions where one or more rows/records would have identical columns, yet some rows would vary in layout and content. These types of multiple record flat files will use a template designed to provide the attributes of each record and of each field in the record. These types of multiple record flat files will also include a record identifier to enable the translation tool to correctly identify the record template to be used (since each record could be one of many types). The record identifier makes it possible for the integration tool to know which record format (template) should be used.
When determining whether or not a “Flat File” structure is appropriate, consideration should be given to the number of different record types and whether the records will have fixed (maximum length) fields, or will be variable length fields. While flat files are easy to develop and simple to use, limitations do exist. One advantage is the ability for an integration tool to parse through flat-file data. This is relatively trivial as the integration tool is configured to search either for a fixed length field or to search for data found between (or before) the data delimiters. One disadvantage is applying “sort” functions over this type of data. This often becomes an issue when a flat file is used as a report and the records require re-sequencing. Because the “template” only describes the structure of the records and fields, using this to re-sort data (or identify a specific record) is challenging (unless handled programmatically).
Simplicity of the “Flat File” also means a limitation of functionality, and for that reason, true “relational” databases are often the better option.
“Relational” databases take on the challenge of storing data of differing definitions or formats separately in “tables.” Each table will consist of two groups of columns identified as key values and stored values. The key values make it possible to relate records in one table to another and in a parent-child relationship or dependency.
Flat file databases are typically independent of each other and require no outside architecture to define or store the data for later interpretation. They can be easily printed or the data edited without interpretation beyond the use of the file specification.
Relational databases will, however, require a structuring “container” often referred to as a database “server”, which stores and interprets the “metadata” defining the content.
Although there are sets of standards that each relational database can be structured for, they typically require a tool or product referred to as an “interpreter.” This makes it possible to view and/or edit the table data. These tools are often included in the database server but other proprietary tools exist to accomplish the same. These tools make it possible to view/edit not only the data in a single table, but can function in a way that multiple database servers and tables can be included in the process. Where flat files were limited on their ability to be sorted (without other secondary processes), relational databases (through these interpretation tools) can have elaborate sorting, sequencing, and selection functions applied. These database and interpretation tools can often be used to eliminate secondary transformations (or custom programs) that might be needed if trying to accomplish the same through a flat file structure.
Some common flat-file formats include Electronic Data Interchange (EDI) standard formats such as X12 and EDIFACT, eXtensible Markup Language or XML, and CSV or comma-delimited files often representing spreadsheets. Common relational database servers include Microsoft SQL Server, Oracle Database Server, MySQL, and others.
Flat file databases are most often used when processing an entire file is necessary. Relational databases are generally implemented in data warehousing implementations where direct record access is essential…where the ability to search-for and locate a single piece of information (or single record or related records) is required.
When an intermediate database becomes necessary, the database ultimately developed will depend on how the data is to be used. Factors, such as requirements for database access, cost and budget limitations, existing company (in-house) expertise, and the other applications that have need to access this data will all contribute to the database selected and deployed. Flat-file databases are simple and essentially “free” but limit data access to manual processes and structured programs. Relational databases are generally more complex with higher/varying costs but provide advanced capabilities and more efficient access options.