Data-tagging Tools

Tools to help a data holder choose an appropriate DataTag and sharing policy for a dataset are currently available and under development. Data-tagging tools aid a user in navigating a complex body of privacy laws, contractual agreements, consent forms, and institutional policies; assessing risks associated with a given set of data; and selecting a custom data handling policy tailored to the unique risk profile and legal restrictions governing the dataset.

Example: How One Data-tagging Tool Works

There is no one prescribed way of data-tagging; rather, there are several possible approaches. For example, some data-tagging tools follow a three-step process for automating the assessment of data handling rules that apply to an individual dataset, and the output is a DataTag and a custom sharing policy that describes how the dataset can be stored, transmitted, and used over time.

Step 1: Questionnaire. The person tagging the data answers a series of questions from a dynamic interview application designed to elicit the key properties of a given dataset while minimizing the number of questions presented to the user.

Step 2: Assessment. Based on the user’s responses, these kinds of data-tagging tools apply inference rules to determine which handling requirements are relevant to the dataset.

Step 3: Assignment. The data-tagging tools assign simple, iconic DataTags and a custom policy that indicate how the dataset can be stored, transmitted, or used based on its properties and the applicable restrictions.

Current Research

Data management activities interact with DataTag systems in different ways and at different levels of engagement, from theory to practice. Among the many data-tagging tools in development are decision-tree and rule-based interview applications, modular license generators, and questionnaire validation and visualization tools.

To support the development of data-tagging tools, research teams are currently analyzing the laws that constrain the sharing of sensitive research data, common contractual approaches to research data sharing, and approaches to assessing and mitigating risk when sharing sensitive data. Researchers are also exploring the use of logic programming languages as alternative representations of data-tagging surveys. These ongoing research efforts underlie the conceptual frameworks and practical tools being developed to implement DataTag systems, and some of the theoretical research extends beyond the tool implementations detailed below.

Beta Tools

Beta versions of a number of data-tagging tools are currently available. A demo user interface for an interview application is available through this web site. It provides a sample decision tree-based survey for determining whether provisions of select laws related to the privacy of medical, educational, and government records apply to a given set of data. A command-line interview tool is also available for debugging decision tree-based surveys. Related validation tools enable the detection of unused questions and tags as well as duplicate answers to questions, and visualization tools can be used to produce decision tree and flowchart diagrams for each survey. These tools make use of a custom language for defining tags and surveys, as well as an embeddable Java library that enables the integration of the interview applications into any application running on a Java virtual machine.

Ongoing work is focused on adding features to the custom language, including support for questions with multiple answers and conditional branching, as well as developing new interview inspection and interactive visualization tools. In addition, beta versions of rule-based interview applications and modular license generators using the logical programming language Prolog are in development. Ongoing work also includes development of end-to-end systems for specific kinds of data and uses.