Selecting a Database: Flat File vs. Relational
lease note: This post originally appeared on Extol.com (EXTOL has been acquired by Cleo).
Determining the type of database to be deployed for a project is a combination of access requirements and preference. In an effort to make an informed decision on which to deploy, the application engineer should be familiar with the types of databases as well as the pros/cons of each. In this entry we will consider two general types of databases and explore some of their applications and key points in the decision cycle when faced with a project.
“Flat File” databases consist of formats including single or multiple record types, and come in flavors of fixed-length definitions and delimited. The simplest form of flat file is a standard text file and consists of a single record definition. The record or “row” (as commonly referred to) repeats from one to many times, with each successive row representing a common definition. Every row is made up of a horizontal list of fields and the same definition of the row can be applied to every row in the file.
More complex forms of flat files would consist of multiple record definitions, where one or more records would have similar columns to others, yet some rows would vary in layout and content. These multiple record formatted files will use a template or record identifier to distinguish how the remaining items in the row should be interpreted.
Fixed length definitions structure each column across the row with a defined start and stop position. Delimited structures may have a predefined minimum and maximum length but the start and stop position of each column is indicated by a “special” character value. These characters imply the end of a column and therefore can only exist within the data content of a column under special circumstances such as within quoted values (values surrounded by typically double quotes “”).
“Relational” databases take on the challenge of storing data of differing definitions or formats separately in what are referred to most often as “tables.” Each table will consist of two groups of columns identified as key values and stored values. The key values make it possible to relate records in one table to another in a parent-child relationship or dependency.
Flat file databases are typically independent of each other or self-contained, and require no outside architecture to define or store the data for later interpretation. They can easily be edited and printed directly without interpretation beyond the file specification (layout). Relational databases will, however, require a structuring “container” often referred to as a database “server” which stores and interprets the “metadata” defining the content.
Although there are sets of standards each relational database can be structured for, they typically require an “interpreter” tool to edit or view the data. These interpretation tools are quite often made available as part of the database server or as an additional related product and can be proprietary in nature. There are also some “generic” tools that function using a specific standard and can be shared over varying database servers.
Some common flat file formats would include Electronic Data Interchange (EDI) standard formats such as X12 and EDIFACT, eXtensible Markup Language or XML, and CSV or comma-delimited files often representing spreadsheets. Common relational database servers would include Microsoft SQL Server, Oracle Database Server and MySQL.
Flat file databases are most often used in a “transactional” nature and when entire file processing is required, where Relational Databases are generally found in data warehousing implementations where direct record access is essential.
The database ultimately adopted will largely depend on the nature and purpose of the data, the database access requirements, any cost/budget variations, company in-house technical expertise, and knowledge of the other applications needing to access that database. Flat-file databases are simple and are essentially “free” but limit data access to manual processes and/or structured programs. Relational databases are generally more complex with varying costs but provide advanced capabilities and more efficient access options.