LEGAL
Data collection during e-discovery is critically important because a significant number of court sanctions are the result of inadequate or improper data collection. Here are two examples:
Clearly, improper data collection can result in potentially significant sanctions.
Courts tend to approve of “casting a wide net”
There are two basic approaches to collecting data during an e-discovery exercise:
Courts tend to approve of the “casting a wide net” approach to data collection because it provides the assurance that a party is collecting everything that might potentially be relevant. Attorneys also often favor this approach because it saves them time by reducing the amount of effort required to gather data and because gathering everything possible is often faster than taking a more selective approach.
The reality is that it’s better to do the opposite
The best practice for data collection is still to collect a large amount of data, but also to cull it so that only relevant data remains during e-discovery. For example, instead of making forensic copies of the contents of a large number of hard drives, it is more advantageous to produce only the relevant content from these hard drives through appropriate culling processes. While there are some limited situations in which courts seek production of very large quantities of data, this is not the norm.
The primary advantage of collecting information in a more focused manner is that it saves substantially on attorney and paralegal costs and processing fees since there is less information to examine during document processing. Content hosting fees are also lower because less data is stored during the litigation process. Moreover, given the tighter timelines that will be imposed on e-discovery under the new FRCP amendments going into effect in December 2015, minimizing the amount of data collected may offer advantages when attempting to work within the more restrictive timeframes that will be imposed.
The majority of e-discovery cases do not generate enormous amounts of data, however, it is essential to keep in mind that:
There is wide variability in organizations’ technology proficiency
Most smaller organizations generally do not have the technology proficiency or specialized skill sets required to adequately address the various data collection issues involved in e-discovery. This not only tends to drive up the costs of data collection, but it also increases the risk of over- or under-collecting data, spoliation of data, or data being rendered inadmissible.
There are several best practices that organizations should consider when addressing mid-range data collections.
Assemble the right personnel for data collection
First and foremost, a team with the right knowledge and skill set is key to reducing risk in data collection. The point person should have a strong background in IT because some of the content that may need to be collected will be from sources that require more specialized collection skills, such as proprietary CRM systems, Microsoft SharePoint, or databases.
The importance of having an IT staff member as the data collection lead who is skilled in finding and collecting ESI cannot be under estimated. For example, in the case of Green v. Blitz USA, Inc., the manager that was put in charge of the defendant’s data collection efforts described himself as “about as computer ... illiterate as they get.” While there are risks inherent in self-collection, these risks can largely be mitigated if the leader of the collection effort is technically competent.
Ideally, if the resources and personnel are available, a team consisting of IT, legal, and business staff members should be assembled to manage the data collection process. These skill sets will permit a more thorough understanding of what is being collected and the relevance of the collected data to ensure further mitigation of risk in the collection process. While many in the legal profession are opposed to organizations’ self-collection of data during e-discovery, having a team of competent professionals with the right technical skills can mitigate much of the risk during data collection.
Create a data map
The next step should be to create a data map that will help to inventory corporate data and identify the location and type of all data that may be subject to collection. The benefit of a data map is that it can guide data collectors and speed the data collection process. Moreover, it can also satisfy a court’s requirement that an organization make a good faith assessment of where all potentially relevant data is located.
In an ideal world, creating a data map would be a relatively simple exercise, but it won’t be in many organizations. Potentially relevant data can be found on corporate desktops, laptops, mobile phones, and tablets; corporate email systems; SharePoint and other collaboration systems; employee-owned laptops, mobile phones, and tablets; employee-managed file sync and share solutions like Dropbox; corporate file shares; USB drives; and corporate- and employee-managed cloud storage and backup systems. Data types can include email, files, text messages, social media posts, photographs, and a wide range of other data types.
There are two challenges inherent in creating a data map. First, when data is distributed across an organization and among many different platforms – only some of which are under IT’s control – data collected for e-discovery is a moving target and can be difficult to find. Second, some data may be difficult to locate at all. For example, a corporate business record created by an employee on his or her personally owned tablet and saved to a personal file sync and share tool may be “invisible” to those charged with collecting data for e-discovery. However, it is essential to collect data from all relevant sources, even those that are under the control of individual employees. For example, in the case of Small v. University Medical Center of Southern Nevada, the special master assigned to the case recommended a default judgment in favor of more than 600 plaintiffs because data from personally owned mobile devices, among other data sources, was not retained properly by the defendant.
Ensure that metadata is preserved
It is essential to collect data properly so that metadata is preserved throughout the data collection process. Metadata – which is data within files that provides information about these files, such as the author and last accessed date – is an essential element that must be retained intact and unmodified during the data collection process in order for information to be defensible. For example, simple drag-and-drop of data during the collection process can alter the metadata of the copied files, potentially rendering the data inadmissible for e-discovery. So important is metadata in the context of discoverable information that the Supreme Courts of Arizona and Washington State have determined that metadata must be retained as part of the information that an organization archives.
Focus on the “low-hanging fruit” first
Another best practice is to concentrate first on the “low-hanging fruit” – the repositories that contain the largest volumes of data that will be relevant during e-discovery. In most organizations, this will include corporate email systems (which in most organizations will be Microsoft Exchange on the backend and Outlook at the desktop or laptop) and employees’ personal directories on their hard drives. Email systems are typically the largest single repository of corporate business records in most organizations, largely because the typical information worker spends at least 150 minutes per day doing work in their email system. One best practice as part of the data collection process can include extracting necessary content into .pst files or equivalents for loading into review platforms, although other repositories must also be processed.
Data collection is an essential element of the e-discovery process because of the important ramifications it can have on the admissibility of evidence and the mitigation of risk during litigation. Organizations involved in mid-range data collection efforts should take special care to follow appropriate best practices so that collected data is defensibly gathered, the costs of data collection are kept as low as possible, and risk is minimized.
Kyle Sparks is a CEDS Certified Speaker. Kyle’s 22-year career in the legal discovery profession has traversed firm and vendor leadership roles. From paper discovery in big tobacco litigation to building a litigation support department focused on e-discovery for an Am Law 200 firm, Kyle has obtained a comprehensive understanding of the discipline. Serving as an IT and lit support manager has provided a wide scope of industry software and legal knowledge. Today, as a Senior e-discovery Specialist and subject matter expert for Thomson Reuters, Kyle specializes in educating clients on all phases of the EDRM model as well as rules of civil procedure.