Here's a continuation of my outline of the 2016 edition of Craig Ball's Electronic Discovery Workbook which I last posted about on February 25, 2018.
XVI. Databases in E-Discovery A. General a. Most electronic evidence is not in document form. Assembled on the fly from different sources. b. In re Ford Motor Co. (11th Cir.) reversal of trial court order allowing access to Ford databases. Leaves open the possibility to gain access if discovery obligations are not met. c. Every web-based email or Outlook email is a report from a database query. d. Flat File Database – no relationships between records except the table structure – order of fields in each record. e. Relational Database – identifies relationships between common fields of data. E.g., Access or enterprise level applications such as Oracle or SAP. f. Database Management System (DBMS) – software to manage and integrate tables. g. MySQL – open source database application
B. Relational Databases a. Each table has a primary key that cannot repeat within the table. Used for indexing the table in order to make it more efficient to sort, search, and perform other operations. b. Tuple – content of each row in a database c. Attribute – each column in a table.. d. Relation – set of tuples that have the same attributes e. Child tables are pointed to parent tables to maintain referential integrity of a database. Each child table can only have one parent table. f. Contents of fields should conform to certain constraints to insure data integrity. i. Field size – number of characters ii. Data type – masks for text, currency, integers, etc. iii. Unique fields – primary keys iv. Group or member lists – fields only populated with certain options. v. Validation rules – range of value; number format, e.g. for credit cards vi. Required data – mandatory fields g. To find a given record in a table, the database software simply starts accessing data for that record at a distance (also called an “offset”) from the start of the table equal to the number of records times the total length allocated to each record. h. Variable length fields use pointer fields that redirect data to a point in a memo file where the field data begins. i. Query language – commands used to retrieve information from a database
C. SQL a. SQL computer language that can’t be used to construct applications, but only to create and manage databases. b. SQL uses subset command sets 1. Data Control (DCL) 2. Data Manipulation (DML) 3. Data Definition (DDL) c. SQL server – Microsoft application
D. Schemas, Data Dictionaries, System Catalogs, and ERDs a. Logical schema – design of table structures, attributes, fields and joins. b. Physical schema – hardware and software implementation on storage devices and networks c. System catalog – compendium of metadata d. Enterprise data maps – graphical depictions of database structures. e. Entry-Relationship Modeling (ERM) shows the conceptual and logical schema of a relational database. Generate ERDs (Entry-Relationship Diagrams).
E. Ball Example of Database Discovery from Experience Serving as Special Master a. Defendants have to prepare spreadsheet listing i. Names of systems ii. Applications iii. Date ranges of data iv. Size of database v. User groups vi. Available documentation (ERDs and data dictionaries) b. Special meet and confer phone calls involving database admins that focus exclusively on databases. c. Plaintiffs able to propose specific SQL queries d. Technical personnel were able to collaborate and find solutions where the attorneys only found conflicts. e. Persons asked to perform queries may not know how to run searches, or may not have rights to all data. May not be aware of what privileges they have. f. Older records may be routinely purged. g. Tables may not be joined in ways needed to gather the needed data.
F. Preservation of Databases a. Many databases are accretive e.g., those used for accounting. b. Snapshots of data or full backup sensible approach, but it’s better to duplicate the particular dynamic data needed for a legal matter. c. Retain ability to generate reports that may be lost because of dynamic processes. d. Mirroring data environment for enterprise level systems such as Oracle or SAP may be too difficult, but Infrastructure as a Service cloud solutions may make it possible. e. Secure a test export before undertaking a high volume export. f. Demanding production of an entire database is often not a reasonable request. If the database is produced, the information may not be usable.