THE VDA FRAMEWORK
We increasingly record social life – be it on mobile phones, CCTV, drone cameras, or body-worn cameras. As the availability of video data continues to proliferate in societies across the world, the relevance of visual data for researchers increases in kind. Video Data Analysis (VDA) facilitates the use of 21st century video in social science research.
What is VDA?
VDA is a curated multi-disciplinary collection of tools, techniques, and quality criteria intended for analyzing the content of visuals to study driving dynamics of social behavior and events ‘in the wild.’ It often uses visual data in combination with other data types.
At its core, Video Data Analysis (VDA) focuses on the content of recordings to study situational dynamics in social processes and outcomes. How do teams collaborate successfully in organizations? How do families interact with their toddlers and how does this shape learning? How do victory celebrations in sports change over time? What does racial discrimination in the housing market look like on the level of interactions? In more fundamental terms, VDA aims to facilitate studying how patterns of behaviors, actions, and interactions shape the fabric of social life, and how they impact, and are impacted by, structural factors. VDA uses videos, other (audio-)visual data, and complementary non-video data to analyze the driving dynamics of such processes and outcomes at the micro level.
The main source of data in Video Data Analysis (VDA) studies are ready-made or custom-made videos that capture events or situations in real life (i.e., outside laboratory settings). Ready-made data are produced as by-products of non-academic activities. Custom-made data are collected by researchers for the purpose of a given study. Both have unique strengths and drawbacks for VDA, but both have a number of unique affordances: They comprise direct and detailed observations of driving dynamics at the micro-level during real-life social processes and events, are easy to re-watch and share with colleagues and readers, and often enable highly efficient data collection.
How 21st century video data reshapes social science research
VDA was developed in response to recent social and technological shifts. In 2011, anti-government protests and uprisings erupted in Northern Africa and the Middle East in what is often called the ‘Arab Spring’. For the first time, smartphone recordings and social media posts played a crucial role in the spread of protests and their global visibility. Thousands of citizens filmed the protests with their phones, often capturing the same instance from multiple camera angles. Especially Egypt’s self-proclaimed ‘Facebook generation’ uploaded large amounts of these videos on social media platforms. As a result, everyone could see first-hand footage of the protests, taken from the perspective of those involved in the marches. Major media outlets began to incorporate such videos in their investigation and coverage to find out and convey what happened on the ground. As one journalist describes: ‘In most cases citizens capture the breaking news moments first. The Arab spring was really the tipping point when it all came together’ (Batty 2011).
Since then, video footage from smartphones, surveillance cameras, and other non-newsmedia sources form a natural part of reporting on and our perception of uprisings and civil war, from the 2021 Storm of the US Capitol to the war in Ukraine and many other large-scale events. Ubiquitous video cameras also capture rare events such as mass panics and natural disasters, including the panic at a 2010 music festival in Germany that left 21 people dead and 652 injured. Thousands of videos on social media sites show behavior and interactions during floods and earthquakes. On May 25, 2020, the brutal murder of George Floyd was filmed by bystanders‘ mobile phones, CCTV cameras, and officers’ body-worn cameras.
But not only the extraordinary, dramatic, or terrible is being captured. People use smartphones to document more common events, such as weddings and funerals, or to capture mundane events, such as playing video games or spending time with friends. CCTV cameras capture everyday behavior in public places, on school yards, and even in workplaces. Livestreams capture press conferences, business meetings, trials, and a wide array of other events and occasions. And with the Covid-19 pandemic, we saw large parts of social life move to video chat platforms; job talks, breakups, marriage proposals, kindergarten kids drawing and chatting; it was all filmed as we lived our lives from a distance. We saw the creative ways in which teachers engaged their students through online platforms, and the hot mess that was the Handforth Parish Council meeting, a UK regional government body meeting that was conducted via Zoom and went viral for its utterly chaotic opening sequence.
Parallel to this increase in third-hand video data, advances in camera and data storage technology also enabled new ways of collecting first-hand videos for research, by researchers. In short, we find ourselves in a new era of how social life is captured.
For social scientists, who are inherently interested in how social life ‘works,’ these videos can generate completely new insights. Over the past decade, using 21st-century video data has allowed researchers across sociology, criminology communication studies, anthropology, psychology, education research, political science, and other disciplines to make groundbreaking discoveries. Researchers can now use video data to find out how we talk to each other, how we express emotions, how we fight, or how we learn. We can use video to study what successful workplace meetings look like, how family interactions shape pre-school learning, or how people do racial discrimination in education or in the housing market. Thanks to video data, we can look at situations step-by-step and frame-by-frame to understand sequences, interaction routines, communication patterns, social hierarchies, or other aspects of culture and social life. Moreover, through video data, we have access to permanent first-hand recordings of situations and events that we did not necessarily observe ourselves. We can look at the same situation over and over again, as a team or alone.
Using these new sources of video data and building on established theoretical perspectives and methodological approaches, new lines of inquiry have emerged across the social sciences, in which researchers use video data to study situational driving dynamics of social processes and events. Situational dynamics refer to things that happen when people interact with one another, while being in temporal proximity and physical or mediated co-presence. Recent Studies use video data to seek to understand the situational dynamics of real-life events and processes by studying human behavior, interactions, and emotions in situations, and how they impact social outcomes (for a list of recent publications in the field, click here). From these developments, Video Data Analysis has emerged as a methodological framework in an effort to systematize methodological development, formulate open questions and solutions, and increase inter-disciplinary dialogue.
VDA use cases
Studies of what happens “on the ground” during events or situations can take three perspectives (for details, see Nassauer and Legewie 2022): seeing such micro-level processes as consequences or manifestations of “structural” or “macro” phenomena (e.g., the education system); seeing micro-level processes as phenomena in their own right (e.g., to understand how we maintain order in everyday life); or perceiving micro-level processes as driving forces behind macro phenomena (e.g., micro-level processes may produce differences in the frequency of officer-involved shootings in the United States).
One of the most prominent among recent applications in sociology is Collins’ (2008) analysis of pictures and videos to study emotional dynamics in a variety of violent and near-violent situations. Collins focuses on the minutes and seconds before and during violent behaviors and identifies emotions in actors’ facial muscles and body postures. This approach allows Collins to challenge core assumptions of conventional theories of violence, by showing that situational emotions, instead of actors’ prior strategies or motivations, trigger violent behaviors. Visual data are instrumental in enabling Collins to develop his argument and corroborate his findings. His study suggests that “there is crucial causality at the micro-level” (Collins 2016), which can be uncovered using visual data. Further examples from studies on deviant behavior include analyses of massacres (Klusemann 2009) and protest violence (Nassauer 2019). Criminologists increasingly use visual data to study crime as it unfolds (Lindegaard and Bernasco 2018; Stickle et al. 2020). In the field of policing (McCluskey et al. 2019; Sytsma, Chillar, and Piza 2021), videos can capture how real-life police-citizen encounters unfold and thereby contribute to our understanding of policework as part of social order, the state monopoly of force, procedural justice, and police use of force. In law studies, scholars have used video data to, among other things, better understand the jury deliberation process (Diamond et al. 2003) and successful interrogation techniques (Alison et al. 2013). In business and organization studies, scholars have researched behavior and team work in emergency call centers (Fele 2008; Mondada 2008) or nuclear power plant control rooms (Waller, Gupta, and Giambatista 2004). In primary care research, videos have been used to study topics such as staff-patient interactions in nursing centers (Caldwell and Atwal 2005), training and simulation exercises of health care workers (Hunziker et al. 2011), and cooperation in anesthesia teams (Burtscher et al. 2010; for an overview, see Heath, Hindmarsh, and Luff 2010:8ff; Asan and Montague 2014). Research in education, too, has a tradition of studying visuals to examine non-verbal and verbal aspects of social interaction, and videos are now a prominent tool in the field (Derry et al. 2010:4). Examples include classroom interactions (Andersson and Sørvik 2013) and peacemaking among children (Verbeek 2008). In political science, videos have been used to study the impact of candidate debates on audiences’ voting behavior (Brierley, Kramon, and Ofosu 2020) and cooperation and outgroup perception (Chang and Peisakhin 2019; Choi, Poertner, and Sambanis 2019). You find links to these and many other VDA studies in the research articles section.
As the examples illustrate, in Video Data Analysis researchers use video data to find out how we talk to each other, how we express emotions, how we fight, or how we learn. We can use video to study what successful workplace meetings look like, how family interactions shape pre-school learning, or how people do racial discrimination in education or in the housing market. Thanks to video data, we can look at situations step-by-step and frame-by-frame to understand sequences, interaction routines, communication patterns, social hierarchies, or other aspects of culture and social life. Moreover, through video data, we have access to permanent first-hand recordings of situations and events that we did not necessarily observe ourselves. We can look at the same situation over and over again, as a team or alone.
Social life is now ubiquitously captured on camera. For researchers interested in social life, these recordings offer a new world of insights. Paraphrasing Carl Sagan, we are still at the shore of an ocean of possibilities that video data offer. Over the next decades these possibilities will benefit scientific research and our understanding of human behavior.
How to use VDA: ANALYTICal DIMENSIONS AND PROCEDURES
Analytical dimensions refer to the content of visual data that are of interest when analyzing situations: facial expressions and body posture, interactions, and context. Facial expressions and body postures are any nonverbal information that a person’s face and body convey. Interactions refer to anything people do or say that is geared toward or affects their environment or people within. Context means information on the physical and social setting of a situation. These dimensions should be understood as lenses that help deriving information from visual recordings and that might help to understand situational dynamics, provided they draw on a thorough theoretical reflection and employ clear, detailed coding schemes.
VDA can be used in indicative and deductive approaches, qualitative in-depth and quantitative large-N, or even computational analyses.
Although these approaches differ in many ways, VDA approaches are united by a number of analytical procedures. First, coding of video data plays a central role in analysis. Coding means to tag a section of data with labels that synthesize content as relevant to a given research project. Some researchers conduct coding in their analysis without using the term itself, and studies differ in whether they develop a coding scheme first, then code data (a deductive approach), or whether they use an iterative approach of data collection, coding, and analysis (an inductive or abductive approach). Still, all types of qualitative and quantitative analysis include some type of data coding in order to make sense of it and identify patterns.
Second, the above figure shows six analytic lenses (for more information, see Ch. 6 of our book), which move researchers from labeling the data to identifying and interpreting patterns or driving dynamics: counts and quantifications, timing and sequence, rhythm and turn-taking, actors, networks and relations, and spacing. These procedures can help in analyzing video data, regardless of whether the aim is to describe patterns at the micro level, or to study causal links within situations or events. The six procedures all build on coding of the data (indicated by the black bi-directional arrows), and they are all interconnected (indicated by the grey lines). For instance, one could produce counts and quantifications based on video data that help studying social relations and networks. In other words, the six procedures should not be understood as discrete analytical steps or mutually exclusive ways to analyze video data. Rather, they are a non-exhaustive toolbox from which researchers can pick any combination of tools that work well for what they try to accomplish in their VDA.
As the above figure suggests, researchers can also use additional analytical tools together with VDA to produce interesting findings. These can be anything from regression analyses to sequence analysis and other approaches concerned with sequential patterns, to simulations, to comparative configurational methods, or any other approach that allows gleaning additional insights from the analyzed video data. Depending on how one sets up the research in terms of theoretical perspective, object of study, and analytical foci, it may or may not make sense to use one of these tools together with analysis of your video data.
Criteria for validity include neutral or balanced data sources, optimal capture, and natural behavior. Neutral or balanced data sources should not reflect an adherence to particular interests that could lead to the concordant publication or provision of access to biased data; if sources that demonstrate a propensity for specific interests are used, researchers should seek to triangulate various sources representing divergent interests. Optimal capture means visual data should cover the duration of a situation or event, its space, and all actors involved. Natural behavior refers to an actor’s unaltered behavior in a given situation, that is, the researcher should consider the degree to which actors recorded in visual data behave the same way that they would have otherwise behaved, were a camera not present.
Research ethics concern the application of ethical principles to the research process as a reflection of moral rules and values, with core goals being protecting participants from harm, avoiding conflict of interest and misrepresentation, respecting common laws, and adhering to standards such as professional competence and nondiscrimination. These provide the overall frame of reference for reflections on online video research. In the context of VDA, five ethical areas are specifically relevant: (1) informed consent; (2) privacy; (3) unique opportunities; (4) potential harm; and (5) transparency.
Informed consent means that people should know that they are being researched, receive relevant information on the planned research in a comprehensible format, and should then voluntarily agree to participate, or decline to do so. When assessing issues of informed consent, we suggest asking three questions of the data: Does the space filmed require informed consent? If not: Was the focus on the space or on people? And if the space filmed requires consent: did people give consent?
Privacy means respect people’s private information and anonymity, both of which are complex concepts that need to be re-evaluated for online contexts. To assess issues of privacy in livestreams and online video research, we suggest reflecting on the online and situational context in which the footage was found and taken, and the content of the video.
Unique opportunities focuses on the potential benefits of a study. We suggest asking: does a study offer unique potential for scientific insights and/or real-life benefits, and could other data could replace the video fully or in parts of the analysis?
Potential harm refers to possible ways in which a study may hurt the study subjects, researchers, or third parties. Since one of the main goals in research ethics is to minimize harm, a rigorous assessment of potential harm is essential for any study. We suggest three for assessing potential harm specific to video research: What kind of behavior is depicted? Could data harm people or groups depicted? How publicly available is the video prior to research?
Finally, transparency refers to making goals, procedures, and data as accessible to the public as possible, thereby improving traceability and openness of scientific processes and findings. We suggest four questions to assess whether a given study can actualize the immense potential for transparency inherent in video research: can permanent access to the video be assured? Can the researcher share the data with reviewers? Can the researcher share the data with the broader research community? And can the researcher share the data during talks at conferences or workshop?
These ethical areas and principles always have to be evaluated in relation to each other and weighed against each other in the context of a specific study. Of course, an outcome of such an assessment may very well be that a study is too unethical to be implemented, or needs substantial revisions.
When not to USE VDA
VDA is not suited for all types of research questions and theoretical approaches and, like all methodological approaches, it entails limitations and challenges. First, the type of data used by VDA implies limited access to video recordings from private events, such as funerals in Western societies. Second, VDA does not offer the tacit knowledge and immersion in a social context that comes with continuous direct participant observation, and it does not offer the same potential as ethnography for studying the cultural knowledge or narratives of a specific community or group of people. Third, interpretation of certain elements, such as gestures, may be context dependent, making VDA less suitable to study social contexts that a researcher is unfamiliar with. Fourth, a number of research ethics questions remain unclear with the new types of video data VDA often employs; e.g., what types of video from which platforms are admissible to use as research data.