|Identifying information was never collected. This data can not be linked across time or measures.
|Individual data that is summarized at a group level.
|vertical join, join columns
|Stacking datasets on top of each other (matching variables).
|The transfer of data to a facility, such as a repository, that preserves and stores data long-term.
|The loss of study units from the sample, often seen in longitudinal studies.
|Raw data that has been manipulated or modified for the purposes of correcting and clarifying information.
|A group of participants recruited into the study at the same time.
|pseudonymized data, indirectly identifiable, confidential data
|Personally identifiable information (PII) has been removed and names are replaced with a code. The only way to link the data back to an individual is through that code. The identifying code file (linking key) is stored separate from the research data.
|This data is protected from unauthorized disclosure. This data either contains personally identifiable information or can still be linked back to an individual through other means (e.g., identifiable data or coded data).
|Confidentiality concerns data, ensuring participants agree to how their private and identifiable information will be managed and disseminated.
|business as usual (BAU)
|The individual or group does not receive the intervention.
|Data is collected on participants for a single time point.
|The recorded factual material commonly accepted in the scientific community as necessary to validate research findings. (OMB Circular A-110)
|A storage location for researchers to deposit data and supporting materials associated with their research.
|A way of organizing data to allow for more efficient processing and storage. In particular, repeated measures data can be structured in either long or wide format.
|measurement unit, variable format, variable class
|A classification that specifies what types of values are contained in a variable and what kinds of operations can be performed on that variable. Examples of types include numeric, character, logical, or datetime.
|An organized collection of related data stored in tables that can be linked together by a common identifier.
|database schema, data modeling
|A collection of decisions regarding how tables, or datasets, will be organized and related to one another
|data set, data frame, spreadsheet, rectangular data, tabular data, table
|A structured collection of data usually stored in tabular form. A research study usually produces one final dataset per entity/unit (e.g., teacher dataset, student dataset).
|Identifying information has been removed or distorted and the data can no longer be re-associated with the underlying individual (the linking key no longer exists).
|Data created through transformations of existing data (e.g., mean scores).
|These variables are unique to an individual and can be used to directly identify a participant (e.g., name, email address).
|file structure, file tree
|A cataloging structure for files and folders on your computer.
|The risk of re-identifying a participant and the harm that may come from that disclosure.
|Data collected from a study where researchers randomly introduce an intervention and study the effects.
|secondary data, administrative data, third-party data
|Existing data generated/collected by external organizations at an earlier point in time (e.g., school records).
|The Family Educational Rights and Privacy Act is a federal law governing the disclosure of personally identifiable information in education records (e.g., name, address, DOB). The law applies to all public elementary and secondary schools, as well as post-secondary institutions.
|file type, file extension
|A way that information is encoded for storage on a computer. There are both proprietary (e.g., SPSS, XLSX) and non-proprietary formats (e.g., CSV, TXT).
|One or more variables associated with unique values in another table
|The Common Rule (45 CFR 46) definition of a human subject is a living individual about whom an investigator conducting research obtains; 1) Data through intervention or interaction with the individual, or 2) identifiable private information.
|The Health Insurance Portability and Accountability Act is a federal law covering the protection of sensitive health information.
|Data that includes personally identifiable information.
|These variables do not alone identify a particular individual (e.g., ethnicity, gender), but if combined with other information, they could be used to identify a participant
|A mechanism designed to collect original data (e.g., observation form, questionnaire, assessment)
|Limited data set
|Under the HIPAA Privacy Rule, a limited dataset is one in which 16 of the 18 HIPAA protected identifiers have been removed. Age, dates, and city/state/ZIP Code can remain. A limited dataset may be disclosed to external parties without authorization for specified purposes and often a data use agreement is required. This dataset is not considered de-identified and must be safeguarded against unauthorized access.
|The same information is collected from the same subjects at multiple time points.
|In this book, I use the term measure broadly to refer to a collection of items used to measure an outcome (e.g., an existing scale, an existing academic assessment).
|horizontal join, join rows, link
|Combining datasets together in a side-by-side manner (matching on one or more unique identifiers).
|Occurs when there is no data stored in a variable for a particular observation/respondent.
|In this book, the term normalize is used to refer to returning a value to its normal, or expected state
|Data collected from a study where researchers are observing the effect of an intervention without manipulating who is exposed to the intervention.
|First hand data that are generated/collected by the research team as part of the research study.
|study roster, master list, master key, linking key, tracking database
|This database, or spreadsheet, includes any identifiable information on your participants as well as their assigned study ID. It is your only own means of linking your confidential research study data to a participant’s true identity. It is also used to track data collected across time and measures as well as participant attrition.
|A string of characters used to locate files in your directory system.
|Personally identifiable information
|PII, personal data
|This includes direct identifiers (e.g., name and email), as well as indirect identifiers that, if combined with other variables or if in small enough numbers, could identify a participant (e.g., full birthdate and place of birth).
|One or more variables that uniquely define rows in your data
|Privacy concerns people, ensuring they are given control to the access of themselves and their information.
|Highly restricted and typically not publicly shared, or is shared with limited access (i.e., passwords, illegal behaviors, medical records, financial information).
|Protected health information
|The HIPAA Privacy Rule provides protections for 18 identifiers held by covered entities providing health care services.
|Non-numeric data typically made up of text, images, video, or other artifacts.
|Numerical data that can be analyzed with statistical methods.
|Randomized controlled trial
|A study design that randomly assigns participants to a control or treatment condition. In education research you often hear about two types of RCTs. The first being the Individual-Level Randomized Controlled Trial (I-RCT) in which individuals (such as students) are randomized directly to the treatment or control group. The second is a Cluster Randomized Controlled Trial (C-RCT), sometimes also called group-randomized, in which clusters of students (such as classrooms) are randomized.
|Unprocessed data collected directly from a source.
|Being able to produce the same results if the same procedures are used with different materials.
|Being able to produce the same results using the same materials and procedures.
|The Common Rule (45 CFR 46) definition of research is a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.
|non-public data, controlled data, managed access data
|A dataset that cannot be publicly released due to containing sensitive information or a combination of variables that could enable identification. These data require controlled access conditions and may be shared through data use agreements or other application processes.
|Safe harbor method
|Under the HIPAA Privacy Rule, there are two methods of de-identification. The Safe Harbor method allows covered entities to treat data as de-identified if all 18 PHI variables are removed.
|An umbrella term that encompasses proprietary, ethical, contractual, or private information that should be protected from unwarranted disclosure. There are varying levels of data sensitivity.
|Developing and implementing a set of consistent procedures
|A single funded research project resulting in one or more datasets to be used to answer a research question.
|case, participant, site, record
|A person or place participating in research and has one or more piece of data collected on them.
|code, program, script
|Programming statements written in a text editor. The statements are machine-readable instructions processed by your computer.
|A means used to collect data using an instrument (e.g., a paper form, an online survey platform)
|The individual or group receives the intervention.
|Unique participant identifier
|study ID, site ID, unique identifier (UID), subject ID, participant code, record ID
|This is a unique numeric or alphanumeric identifier, assigned to every participant or site, and used to create confidential and de-identified data. These identifiers allow researchers to link data across time or measure.
|column, field, question, data element
|Any phenomenon you are collecting information on/trying to measure. These variables will make up columns in your datasets or databases.
|A shortened symbolic name given the variable in your data to represent the information it contains.
|time period, time point, event, session
|Intervals of data collection over time.