Abstract

This R notebook serves as an example, demonstrating the analysis and visualization of micro-scenario-based studies. Micro-scenarios provide an approach for evaluating the social acceptance of technologies and the determining factors, along with visuo-spatial mappings of the results. They enable a) the simultaneous assessment of multiple technologies, ranking them based on different criteria, and b) analyzing how individual factors and technology-based attributions correlate with the overall assessment of technologies. Utilizing synthetic survey data (generated in a separate notebook), this notebook illustrates how to recode the data, aggregate scenario scores as user factors, calculate topic scores, and visualize them using the R programming language, along with ggplot and tidyverse.

Introduction

The micro-scenario approach simplifies measuring people’s opinions on different topics (see overview. Based on a single survey the appraoch combines these participants’ responses to 0) a grand mean that servers as a general evaluation of the whole context, 1) individual user factors as reflexive measurements of latent constructs (research perspective 1), and 2) a technology evaluation to rank topics and to create a visual map to pinpoint conflicting issues (research perspective 2).

For instance, consider analysing risk-utility trade-offs among various technologies: Do individuals attribute varying risks and utilities to distinct technologies? Are people predisposed to different risk or utility perceptions? Is the comparability of risk-utility trade-offs consistent across different technologies, and can these trade-offs be quantified? Figure 1 illustrates the overall approach.

Figure 1: The micro-scenario approach involves consolidating evaluations of diverse topics in a single survey. These evaluations are treated as topic assessments and spatially mapped to analyze the interrelationships among them.

The main article provides comprehensive insights into this approach and outlines the methodology for designing and analyzing studies. You can locate and cite the main article here:

Brauner, Philipp (2024) Mapping acceptance: micro scenarios as a dual-perspective approach for assessing public opinion and individual differences in technology perception. Frontiers in Psychology 15:1419564. doi: 10.3389/fpsyg.2024.1419564

This notebook demonstrates the calculation of data for the two research perspectives of micro scenario based surveys (Grand mean, Perspective 1: user factor, and Perspective 2: topic factor) using R. Note that all transformations and calculations can also be performed using other software.

In this example, we utilize synthetic data generated to resemble real survey data. This choice simplifies the follow-through of our approach, eliminating the need for cleaning the data from irrelevant variables or erroneous participant inputs. Additionally, the synthetic data adheres to pre-specified properties. The creation of the synthetic data is detailed in the companion notebook within the same folder (the the linked notebook for details).

The rest of this notebook is organized as follows: Firstly, we load the necessary packages, followed by loading the synthetic data as our input (replace this with your actual data). Secondly, we transform the data into the long format (refer to, for instance, https://tidyr.tidyverse.org/reference/pivot_longer.html), proceed to analyse the data as a user factor (research perspective 1), and subsequently as a topic factor which includes visualizing the outcomes (research perspective 2).

Preparation

Load required libraries

In our analysis, we mainly use the tidyverse and ggplot packages.

library(tidyverse)

Warning: package 'lubridate' was built under R version 4.3.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(scales)  # format_percent


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

library(ggplot2) # graphics
library(ggrepel) # label placement in the scatter plot
library(knitr)   # Tables

Warning: package 'knitr' was built under R version 4.3.3

Load Data

In this demonstration, we will load the synthetic data that emulates the properties found in real survey data. The other notebook demonstrates the creation of the synthetic data. Figure 2 illustrates the structure of a standard dataset from survey tools, where each row represents the responses from an individual participant.

Figure 2: Illustration of typical survey data utilizing the micro-scenario approach, featuring user demographics, additional user factors, and topic evaluations.

The data structure closely resembles the data export from the Qualtrics survey tool. The process of generating synthetic data is documented in the companion notebook within the same folder.

  data <- readRDS("syntheticdata.rds")

The structure of the data looks as follows:

'data.frame':   100 obs. of  26 variables:
 $ id          : chr  "fakeparticipantid-1" "fakeparticipantid-2" "fakeparticipantid-3" "fakeparticipantid-4" ...
 $ uservariable: num  9.44 11.03 10.48 11.02 8.24 ...
 $ a1_matrix_1 : num  1 3 2 2 1 1 3 2 2 2 ...
 $ a2_matrix_1 : num  2 2 2 3 2 1 3 3 2 2 ...
 $ a3_matrix_1 : num  2 4 3 2 2 2 3 2 4 3 ...
 $ a4_matrix_1 : num  3 4 3 2 2 2 4 1 3 2 ...
 $ a5_matrix_1 : num  3 2 3 2 3 3 5 2 5 2 ...
 $ a6_matrix_1 : num  4 5 5 3 4 3 4 4 4 4 ...
 $ a7_matrix_1 : num  4 5 4 4 4 3 5 4 3 3 ...
 $ a8_matrix_1 : num  4 4 4 4 5 4 6 4 6 4 ...
 $ a9_matrix_1 : num  5 4 4 4 5 4 5 4 5 4 ...
 $ a10_matrix_1: num  6 6 5 5 5 5 7 5 5 5 ...
 $ a11_matrix_1: num  6 6 6 6 6 5 7 5 7 6 ...
 $ a12_matrix_1: num  4 6 6 6 5 6 7 6 7 5 ...
 $ a1_matrix_2 : num  2 3 2 2 2 1 1 3 1 2 ...
 $ a2_matrix_2 : num  1 2 2 2 3 1 1 3 1 3 ...
 $ a3_matrix_2 : num  2 1 2 2 2 2 2 2 1 3 ...
 $ a4_matrix_2 : num  3 2 3 3 3 2 2 1 2 3 ...
 $ a5_matrix_2 : num  4 4 4 5 4 5 4 6 4 5 ...
 $ a6_matrix_2 : num  2 2 3 3 4 3 3 3 2 4 ...
 $ a7_matrix_2 : num  2 2 4 3 3 3 3 5 3 2 ...
 $ a8_matrix_2 : num  2 2 3 3 3 2 2 3 2 4 ...
 $ a9_matrix_2 : num  3 3 4 5 4 3 4 4 3 5 ...
 $ a10_matrix_2: num  3 3 3 3 4 4 3 4 2 4 ...
 $ a11_matrix_2: num  3 4 5 3 4 4 3 4 5 5 ...
 $ a12_matrix_2: num  3 5 4 5 5 5 2 5 4 5 ...

The loaded dataset has various variables. Initially, there’s a unique user identifier (id), followed by a user variable (e.g., attitude towards a topic). Subsequently, there are an arbitrary number of topic assessments (N in our example) with variables for each evaluation dimension. In this instance, we use perceived risk and perceived utility as examples for the topic evaluations. However, one can employ different or additional evaluation dimensions (as detailed in the article).

The variables for the topic evaluations adhere to a standardized naming scheme, i.e., a01_matrix_02, where 01 denotes the ID of the queried topic, 02 represents the queried evaluation dimension, and matrix stands for the name of the variable block in the survey tool. This naming scheme is employed by Quartics.

Analysis of the data

Once the (synthetic) survey data is loaded into the variable data, we can commence the actual analysis.

Setup

Firstly, read the list of queried topics and their labels from a .csv file (adjustable based on your needs). Secondly, define the queried evaluation dimensions. In this instance, we have a vector of two dimensions, but one can define more based on your research questions and survey structure.

TOPICS <- read.csv2("matrixlabels.csv") 
DIMENSIONS = c("risk", "utility")

Long Format

Next, the topic evaluations from the survey data is transformed into the long format using pivot_longer (one row with a single value for each participant, topic, and evaluation dimension; one row per observation). Hereto, we use that the variables for the topic evaluations in the original data table have a systematic naming convention (see above).

The resulting data set contains a participant identifier, identifier for the topic and the evaluation dimension, and lastly a column for the value. We use this format as the foundation for the later transformation steps.

evaluationsLong <- data %>% 
  # selects columns id and "aNUMBER_matrix_NUMBER" (scheme from loop&merge)
  dplyr::select(id, matches("a\\d+\\_matrix\\_\\d+")) %>%
  tidyr::pivot_longer(
    cols = matches("a\\d+\\_matrix\\_\\d+"), # topics and their evaluations
    names_to = c("question", "dimension"),
    names_pattern = "(.*)_matrix_(.*)",   # Separate topic ID and evaluation ID 
    values_to = "value",
    values_drop_na = FALSE) %>%
    dplyr::mutate( dimension = as.numeric(dimension) ) %>%
    dplyr::mutate( dimension = DIMENSIONS[dimension]) %>%  # change to readable dimension names
    dplyr::mutate( value = -(((value - 1)/3) - 1))  # rescale value from [ 1...7 ] to [ -100%...100% ]

# Recode some of the evaluation dimensions if necessary
evaluationsLong <- evaluationsLong %>%
  dplyr::mutate( value = if_else(dimension!="risk", value, -value))

Perspective 1: As user factor

The initial perspective provides a straightforward view of the data. The different presented scenarios serve as a basis for the repeated measurement of the same latent construct and the resulting score can be interpreted as a user factor (or individual differences).

For each evaluation dimension (e.g., risk and utility), we compute average scores across all queried topics. Using these cores one can, for instance, investigate if the overall attributions differ among participants or if they correlate with other queried user factors. For example, exploring if the average risk attributed to all topics relates to a general disposition to risk measured using other psychometric scales.

Subsequently, we rejoin these user factors with the original data using, for instance, dplyr::left_join(). Afterwards, the calculated average evaluations can be regarded as individual differences and correlated with other user factors obtained from the survey.

evaluationByParticipant <- evaluationsLong %>%
  tidyr::pivot_wider(names_from = dimension, values_from = value) %>%
  dplyr::group_by(id) %>%
  dplyr::summarize(
    across(
      all_of( DIMENSIONS ),  # Select only evaluation dimensions
      list( mean = ~mean(., na.rm = TRUE),
#            median = ~median(., na.rm = TRUE),
            sd = ~sd(., na.rm = TRUE)),
      .names = "{.col}_{.fn}"  # Scheme to define column names
    ), .groups="drop"
  ) %>%
  dplyr::left_join(data, by="id")

Example research questions:

How is the average perceived risk of the participants?

summary(evaluationByParticipant$risk_mean)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.50000 -0.19444 -0.06944 -0.06944  0.06250  0.30556

How is the average perceived utility of the participants?

summary(evaluationByParticipant$utility_mean)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1389  0.1875  0.2778  0.2872  0.3889  0.6944

Does uservariable from the survey correlate to risk?

cor.test(evaluationByParticipant$risk_mean,
    evaluationByParticipant$uservariable)


    Pearson's product-moment correlation

data:  evaluationByParticipant$risk_mean and evaluationByParticipant$uservariable
t = 5.1568, df = 98, p-value = 1.308e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2920793 0.6036350
sample estimates:
     cor 
0.461993

Does uservariable correlate with the average perceived utility?

cor.test(evaluationByParticipant$utility_mean,
    evaluationByParticipant$uservariable)


    Pearson's product-moment correlation

data:  evaluationByParticipant$utility_mean and evaluationByParticipant$uservariable
t = 2.1131, df = 98, p-value = 0.03713
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.01286523 0.38921481
sample estimates:
      cor 
0.2087558

Perspective 2: Topic factors

Next, we switch to the analysis of topic evaluations: Instead of looking at how individuals perceive the topics as a whole (individuals across all topics), we now interpret how all individuals assess the respective topics (topics across all individuals), for example, to rank the technologies in terms of the evaluation dimensions. We start with reporting the average evaluations (e.g., risk and utility) across all queried topics.

Calculate Average evaluations

Using the long format, we group by evaluation dimension and aggregate across all topics and participants. Note: For a complete sample, the results are equivalent to perspective 1 (see above). Table 1 and Figure 3 show the outcome of this calculation.

# MEAN and SD of all evaluation dimensions across all queried topics
evaluationByDimension <- evaluationsLong %>%
  dplyr::group_by( dimension ) %>%
  dplyr::summarise( mean = mean(value, na.rm = TRUE),
                    sd = sd(value, na.rm = TRUE),
                    .groups="drop")

Table 1: Averages for each evaluation dimension across all queried topics and across all participants.

dimension	mean	sd
risk	-0.07	0.5
utility	0.29	0.4

overallDimension <- ggplot(evaluationByDimension,
  aes(x = dimension, y = mean)) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = percent_format(),
                     limits=c( -1, +1 )) +
  labs(x = "Evaluation Dimension",
       y = "Values",
       title = "Average Evaluation across all Dimensions and Participants")
overallDimension

Figure 3: Mean evaluation across all topics and aggregated across all participants.

This can also be illustrated in form of a violin plot in combination with a boxplot. Then this graph nicely illustrates the distribution of the topic evaluations and its key parameters (median, quartiles).

dataByDimensionQuestion <- evaluationsLong %>%
  dplyr::group_by( question, dimension ) %>%
  dplyr::summarise( mean = mean(value, na.rm = TRUE),
                    sd = sd(value, na.rm = TRUE),
        .groups = 'drop')

overallDimension <- ggplot(dataByDimensionQuestion,
  aes(x = factor(dimension, levels = c("risk", "utility", y = mean)), y= mean)) + 
  geom_violin() +
  geom_boxplot(width=0.2,
               position = position_dodge(width = 0.75)) +
  scale_y_continuous(labels = percent_format(),
                     limits=c( -1, +1 )) +
  scale_x_discrete(
      labels = c("risk" = "Risk",
                 "utility" = "Utility")) +
  labs(x = "Evaluation Dimension",
       y = "Values",
       title = "Average Evaluation across all Dimensions and Participants")
overallDimension

Figure 4: Mean evaluation across all topics aggregated across all participants illustrated as violin plot (showing the distribution of the topic evaluations).

Prepare Individual Topics

Now, we compute the average evaluations for each topic across all participants. The resulting data frame contains N rows for the number of topics queried and rows for the arithmetic mean and standard deviation for each evaluated dimension (e.g., risk and utility). Finally, we associate labels with each topic using dplyr::left_join(). Figure 5 illustrates the structure of the resulting data.

Figure 5: The resulting data format displays the evaluation of topics. Each row contains the mean evaluation (along with its dispersion) for a specific topic. This structured data can be subjected to further analysis.

The output can be tabulated, sorted, or filtered based on highest/lowest evaluations, and visualized. Table 2 displays the unsorted and unfiltered results.

evaluationByTopic <-  evaluationsLong %>%
  tidyr::pivot_wider(
    names_from = dimension,
    values_from = value) %>%
  dplyr::group_by( question ) %>%
  dplyr::summarize(
    across(
      all_of( DIMENSIONS ),  # Select the variables from var_names
      list(mean = ~mean(., na.rm = TRUE),
           sd = ~sd(., na.rm = TRUE)),
      .names = "{.col}_{.fn}"    # Define column names for the results
    ), .groups="drop"
  ) %>%
  dplyr::left_join(TOPICS, by="question") # attach question labels

Table 2: Average evaluation of the queried topics.

label	risk_mean	risk_sd	utility_mean	utility_sd
Topic 1	-0.70	0.27	0.72	0.21
Topic 10	0.48	0.29	0.14	0.27
Topic 11	0.51	0.29	-0.02	0.24
Topic 12	0.58	0.25	-0.13	0.26
Topic 2	-0.60	0.26	0.71	0.23
Topic 3	-0.48	0.29	0.61	0.26
Topic 4	-0.42	0.30	0.50	0.27
Topic 5 (deliberate outlier)	-0.25	0.26	-0.23	0.23
Topic 6	-0.14	0.25	0.38	0.24
Topic 7	-0.05	0.23	0.27	0.27
Topic 8	0.13	0.27	0.35	0.31
Topic 9	0.10	0.30	0.15	0.26

Topic Correlations

Next, we analyse the correlation between the evaluation dimensions across the topics. In the example in Table 3, we investigate if the attribute risk is related to the attributed utility for the different topics under consideration. In this example, we have only two target variables for the topic evaluations. With more variables, more complex analyses become possible: Such as determining if and to what degree a linear model with risk and utility explains the overall valence towards the queried topics.

Note: Our analysis focuses on the correlations between the topics as attributed by the participants, rather than individual differences among participants.

evaluationByTopic %>%
  dplyr::select(ends_with("_mean")) %>%
  correlation::correlation() %>%
 kable()

Table 3: Correlations between the evaluation dimensions across all topics

Parameter1	Parameter2	r	CI	CI_low	CI_high	t	df_error	p	Method	n_Obs
risk_mean	utility_mean	-0.7490801	0.95	-0.9252278	-0.3072758	-3.575657	10	0.005048	Pearson correlation	12

Visualize the Topics

Finally, the results are presented through a scatter plot. The plot in Figure 6 allows for the visual identification of the dispersion of topics on a spatial map defined by the evaluation dimension. It helps assess if there is a (linear) relationship between the queried evaluation dimensions of the topics, the slope and intercept of that relationship, and if some topics exhibit significantly different evaluations compared to others (outliers).

scatterPlot <- evaluationByTopic %>%
  ggplot( aes( x = risk_mean,
               y = utility_mean,
               label = shortlabel)) + 
  coord_cartesian(clip = "on") +
  geom_vline(xintercept = 0, size = 0.25, color="black", linetype=1) + 
  geom_hline(yintercept = 0, size = 0.25, color="black", linetype=1) + 
  # diagonal line indicating where both dimensions are congruent 
  annotate("segment",
           x = -1, y = +1,
           xend = +1, yend = -1,
           colour = "black",
           linewidth = 0.25,
           linetype = 2) +
  # Annotate the quadrants
  geom_label(aes(x = -1, y = -1, label = "LOW RISK & LOW UTILITY"),
             vjust = "middle", hjust = "inward",
             size = 1.75,
             label.size = NA, color="black", fill = "#7CBF6C") +
  geom_label(aes(x = -1, y = +1, label = "LOW RISK & HIGH UTILITY"),
             vjust = "middle", hjust = "inward",
             size = 1.75,
             label.size = NA, color="black", fill = "#7CBF6C") + 
  geom_label(aes(x = +1, y = -1, label = "HIGH RISK & LOW UTILITY"),
             vjust = "middle", hjust = "inward",
             size = 1.75,
             label.size = NA, color="black", fill = "#7CBF6C") + 
  geom_label(aes(x = +1, y = +1, label = "HIGH RISK & HIGH UTILITY"),
             vjust = "middle", hjust = "inward",
             size = 1.75,
             label.size = NA, color="black", fill = "#7CBF6C") + 
  # add the labels...
  geom_label_repel(
    max.time = 3,
    color = "black",
    fill = "gray95",
    force_pull   = 0,
    max.overlaps = Inf,
    ylim = c(-Inf, Inf),
    xlim = c(-Inf, Inf),
    segment.color ="#3A6B2E",
    segment.size = 0.25,
    min.segment.length = 0,
    size = 2.5,
    label.size = NA,
    label.padding = 0.105,
    box.padding = 0.125
  ) +
  geom_smooth(method = "lm", se = TRUE, color="#3A6B2E") +
  geom_point() +      # geom for the data points
  labs( title = "Illustration of the risk-utility tradeoff ...",
        caption = "Based on synthetic data for illustrative purposes. See linked companion notebook under https://osf.io/96ep5/",
        x = "AVERAGE ESTIMATED RISK\n(without risk — very risky)",
        y = "AVERAGE ESTIMATED UTILITY\n(useless — useful)") +
  scale_x_continuous(labels = percent_format(), limits=c( -1, +1 )) +
  scale_y_continuous(labels = percent_format(), limits=c( -1, +1 )) +
  theme_bw()
scatterPlot

ggsave("simulatedriskutility.pdf",
       plot = scatterPlot,
       width = 8, height = 6,
       units = "in")

Figure 6: Scatter plot of the evaluations of the micro scenarios.

Closing remarks

This notebook showcases the analysis and visualization of surveys using the micro-scenario approach. It includes executable code for examining both research perspectives (individual differences and topic evaluation), which can be adjusted to suit your own survey and data. Ensure accurate coding and polarization of input variables.

It is crucial to recognize the limitations of this approach (e.g., when point estimations are acceptable, potential bias from the sampling of the topics (!)) and I refer to the main article for further guidance and strategies for bias mitigation.

A recent and decent article building on the method can be found here:

Brauner P. et al. “Mapping public perception of artificial intelligence: Expectations, risk–benefit tradeoffs, and value as determinants for societal acceptance” in Technology Forecasting and Social Change (2025)

Acknowledgements:
This approach evolved over time and through several research projects. I would like to thank all those who have directly or indirectly, consciously or unconsciously, inspired me to take a closer look at this approach and who have given me the opportunity to apply this approach in various contexts. In particular, I would like to thank: Ralf Philipsen, without whom the very first study with that approach would never have happened, as we developed the crazy idea to explore the benefits of barriers of using “side-by-side” questions in Limesurvey. Julia Offermann, for indispensable discussions about this approach and so much encouragement and constructive comments during the last meters of the manuscript. Martina Ziefle for igniting scientific curiosity and motivating me to embark on a journey of boundless creativity and exploration. Felix Glawe, Luca Liehner, and Luisa Vervier for working on a study that took this concept to another level. Julian Hildebrandt for in-depth discussions on the approach and for validating the accompanying code. Tim Schmeckel for feedback on the draft of this article.
Throughout the process I received feedback from editors and reviewers that helped to question this approach and improve the foundation of this approach. No scientific method of the social sciences alone will fully answer all of our questions. I hope that this method provides a fresh perspective on exciting and relevant questions.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC- 2023 Internet of Production – 390621612.

Example how to Analyse and Visualize Data from Micro Scenario-based Studies

Other Formats