Automated decision-making systems are becoming increasingly ubiquitous, which creates an immediate need for their interpretability and explainability. However, it remains unclear whether users know what insights an explanation offers and, more importantly, what information it lacks. To answer this question we conducted an online study with 200 participants, which allowed us to assess explainees’ ability to realise explicated information – i.e., factual insights conveyed by an explanation – and unspecified information – i.e, insights that are not communicated by an explanation – across four representative explanation types: model architecture, decision surface visualisation, counterfactual explainability and feature importance. Our findings uncover that highly comprehensible explanations, e.g., feature importance and decision surface visualisation, are exceptionally susceptible to misinterpretation since users tend to infer spurious information that is outside of the scope of these explanations. Additionally, while the users gauge their confidence accurately with respect to the information explicated by these explanations, they tend to be overconfident when misinterpreting the explanations. Our work demonstrates that human comprehension can be a double-edged sword since highly accessible explanations may convince users of their truthfulness while possibly leading to various misinterpretations at the same time. Machine learning explanations should therefore carefully navigate the complex relation between their full scope and limitations to maximise understanding and curb misinterpretation.
2024
Characterizing Information Seeking Processes with Multiple Physiological Signals
Information access systems are getting complex, and our understanding of user behavior during information seeking processes is mainly drawn from qualitative methods, such as observational studies or surveys. Leveraging the advances in sensing technologies, our study aims to characterize user behaviors with physiological signals, particularly in relation to cognitive load, affective arousal, and valence. We conduct a controlled lab study with 26 participants, and collect data including Electrodermal Activities, Photoplethysmogram, Electroencephalogram, and Pupillary Responses. This study examines informational search with four stages: the realization of Information Need (IN), Query Formulation (QF), Query Submission (QS), and Relevance Judgment (RJ). We also include different interaction modalities to represent modern systems, e.g., QS by text-typing or verbalizing, and RJ with text or audio information. We analyze the physiological signals across these stages and report outcomes of pairwise non-parametric repeated-measure statistical tests. The results show that participants experience significantly higher cognitive loads at IN with a subtle increase in alertness, while QF requires higher attention. QS involves demanding cognitive loads than QF. Affective responses are more pronounced at RJ than QS or IN, suggesting greater interest and engagement as knowledge gaps are resolved. To the best of our knowledge, this is the first study that explores user behaviors in a search process employing a more nuanced quantitative analysis of physiological signals. Our findings offer valuable insights into user behavior and emotional responses in information seeking processes. We believe our proposed methodology can inform the characterization of more complex processes, such as conversational information seeking.
E-Scooter Dynamics: Unveiling Rider Behaviours and Interactions with Road Users through Multi-Modal Data Analysis
Electric scooters (e-scooters), characterised by their small size and lightweight design, have revolutionised urban commuting experiences. Their adaptability to multiple mobility infrastructures introduces advantages for users, enhancing the efficiency and flexibility of urban transit. However, this versatility also causes potential challenges, including increased interactions and conflicts with other road users. Previous research has primarily focused on historical trip data, leaving a gap in our understanding of real-time e-scooter user behaviours and interactions. To bridge this gap, we propose a novel multi-modal data collection and integrated data analysis methodology, aimed at capturing the dynamic behaviours of e-scooter riders and their interactions with other road users in real-world settings. We present the study setup and the analysis approach we used for an in the wild study with 15 participants, each traversing a pre-determined route equipped with off-the-shelf commercially available devices (e.g., cameras, bike computers) and eye-tracking glasses.
Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search
Spoken Conversational Search (SCS) poses unique challenges in understanding user-system interactions due to the absence of visual cues, and the complexity of less structured dialogue. Tackling the impacts of cognitive bias in today’s information-rich online environment, especially when SCS becomes more prevalent, this paper integrates insights from information science, psychology, cognitive science, and wearable sensor technology to explore potential opportunities and challenges in studying cognitive biases in SCS. It then outlines a framework for experimental designs with various experiment setups to multimodal instruments. It also analyzes data from an existing dataset as a preliminary example to demonstrate the potential of this framework and discuss its implications for future research. In the end, it discusses the challenges and ethical considerations associated with implementing this approach. This work aims to provoke new directions and discussion in the community and enhance understanding of cognitive biases in Spoken Conversational Search.
Walert: Putting Conversational Information Seeking Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot
Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals’ practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert.
2023
Designing and Evaluating Presentation Strategies for Fact-Checked Content
With the rapid growth of online misinformation, it is crucial to have reliable fact-checking methods. Recent research on finding check-worthy claims and automated fact-checking have made significant advancements. However, limited guidance exists regarding the presentation of fact-checked content to effectively convey verified information to users. We address this research gap by exploring the critical design elements in fact-checking reports and investigating whether credibility and presentation-based design improvements can enhance users’ ability to interpret the report accurately. We co-developed potential content presentation strategies through a workshop involving fact-checking professionals, communication experts, and researchers. The workshop examined the significance and utility of elements such as veracity indicators and explored the feasibility of incorporating interactive components for enhanced information disclosure. Building on the workshop outcomes, we conducted an online experiment involving 76 crowd workers to assess the efficacy of different design strategies. The results indicate that proposed strategies significantly improve users’ ability to accurately interpret the verdict of fact-checking articles. Our findings underscore the critical role of effective presentation of fact reports in addressing the spread of misinformation. By adopting appropriate design enhancements, the effectiveness of fact-checking reports can be maximized, enabling users to make informed judgments.
How Crowd Worker Factors Influence Subjective Annotations: A Study of Tagging Misogynistic Hate Speech in Tweets
Crowdsourced annotation is vital to both collecting labelled data to train and test automated content moderation systems and to support human-in-the-loop review of system decisions. However, annotation tasks such as judging hate speech are subjective and thus highly sensitive to biases stemming from annotator beliefs, characteristics and demographics. We conduct two crowdsourcing studies on Mechanical Turk to examine annotator bias in labelling sexist and misogynistic hate speech. Results from 109 annotators show that annotator political inclination, moral integrity, personality traits, and sexist attitudes significantly impact annotation accuracy and the tendency to tag content as hate speech. In addition, semi-structured interviews with nine crowd workers provide further insights regarding the influence of subjectivity on annotations. In exploring how workers interpret a task - shaped by complex negotiations between platform structures, task instructions, subjective motivations, and external contextual factors - we see annotations not only impacted by worker factors but also simultaneously shaped by the structures under which they labour.
Are footpaths encroached by shared e-scooters? Spatio-temporal Analysis of Micro-mobility Services
Micro-mobility services (e.g., e-bikes, e-scooters) are increasingly popular among urban communities, being a flexible transport option that brings both opportunities and challenges. As a growing mode of transportation, insights gained from micro-mobility usage data are valuable in policy formulation and improving the quality of services. Existing research analyses patterns and features associated with usage distributions in different localities, and focuses on either temporal or spatial aspects. In this paper, we employ a combination of methods that analyse both spatial and temporal characteristics related to e-scooter trips in a more granular level, enabling observations at different time frames and local geographical zones that prior analysis wasn’t able to do. The insights obtained from anonymised, restricted data on shared e-scooter rides show the applicability of the employed method on regulated, privacy preserving micro-mobility trip data. Our results showed population density is the topmost important feature, and it associates with e-scooter usage positively. Population owning motor vehicles is negatively associated with shared e-scooter trips, suggesting a reduction in e-scooter usage among motor vehicle owners. Furthermore, we found that the effect of humidity is more important than precipitation in predicting hourly e-scooter trip count. Buffer analysis showed, nearly 29% trips were stopped, and 27% trips were started on the footpath, revealing higher utilisation of footpaths for parking e-scooters in Melbourne.
Examining the Impact of Uncontrolled Variables on Physiological Signals in User Studies for Information Processing Activities
Physiological signals can potentially be applied as objective measures to understand the behavior and engagement of users interacting with information access systems. However, the signals are highly sensitive, and many controls are required in laboratory user studies. To investigate the extent to which controlled or uncontrolled (i.e., confounding) variables such as task sequence or duration influence the observed signals, we conducted a pilot study where each participant completed four types of information-processing activities (READ, LISTEN, SPEAK, and WRITE). Meanwhile, we collected data on blood volume pulse, electrodermal activity, and pupil responses. We then used machine learning approaches as a mechanism to examine the influence of controlled and uncontrolled variables that commonly arise in user studies. Task duration was found to have a substantial effect on the model performance, suggesting it represents individual differences rather than giving insight into the target variables. This work contributes to our understanding of such variables in using physiological signals in information retrieval user studies.
Combining Worker Factors for Heterogeneous Crowd Task Assignment
Optimising the assignment of tasks to workers is an effective approach to ensure high quality in crowdsourced data - particularly in heterogeneous micro tasks. However, previous attempts at heterogeneous micro task assignment based on worker characteristics are limited to using cognitive skills, despite literature emphasising that worker performance varies based on other parameters. This study is an initial step towards understanding whether and how multiple parameters such as cognitive skills, mood, personality, alertness, comprehension skill, and social and physical context of workers can be leveraged in tandem to improve worker performance estimations in heterogeneous micro tasks. Our predictive models indicate that these parameters have varying effects on worker performance in the five task types considered – sentiment analysis, classification, transcription, named entity recognition and bounding box. Moreover, we note 0.003 - 0.018 reduction in mean absolute error of predicted worker accuracy across all tasks, when task assignment is based on models that consider all parameters vs. models that only consider workers’ cognitive skills. Our findings pave the way for the use of holistic approaches in micro task assignment that effectively quantify worker context.
Mapping 20 years of accessibility research in HCI: A co-word analysis
We employ hierarchical clustering, strategic diagrams, and network core–periphery analysis to assess and visualise the intellectual progress of accessibility research within HCI in the past two decades. The study quantifies and explains the development of accessibility research and its thematic evolution based on 1,535 papers published at TACCESS, ASSETS, IJHCS, and CHI and their respective 3,470 author-assigned keywords. The novelty of this work is based on employing a quantitative methodological approach to provide an overview of accessibility research progress and insights into its driving and trending themes through the period 2001–2021. In addition, we identify declining, emerging, and core backbone themes of accessibility research. Finally, we discuss the opportunities for research that arise from our findings. These contributions provide a roadmap for researchers working on accessibility.
This report describes the participation of the RMIT IR group at the NTCIR-17 FairWeb-1 task. We submitted five runs with the aim of exploring the role of explicit search result diversification (SRD) and ranking fusion to generate fair rankings considering multiple fairness attributes. We also explored the use of a linear combination-based technique (LC) to take into consideration the relevance while re-ranking. In this report, we compared results from all our submitted runs against each other and the retrieval baselines along each topic type separately (i.e., Researcher, Movie, YouTube). Overall, our results show that neither the SRD-based runs nor the linear combination-based runs show any statistically significant improvement over the retrieval baselines. The source code of the framework for generating group memberships is made available at https://github.com/rmit-ir/fairweb-1/
Towards Detecting Tonic Information Processing Activities with Physiological Data
In Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing (UbiComp/ISWC ’23 Adjunct), 2023
Characterizing Information Processing Activities (IPAs) such as reading, listening, speaking, and writing, with physiological signals captured by wearable sensors can broaden the understanding of how people produce and consume information. However, sensors are highly sensitive to external conditions that are not trivial to control – not even in lab user studies. We conducted a pilot study (N = 7) to assess the robustness and sensitivity of physiological signals across four IPAs (READ, LISTEN, SPEAK, and WRITE) using multiple sensors. The collected signals include Electrodermal Activities, Blood Volume Pulse, gaze, and head motion. We observed consistent trends across participants, and ten features with statistically significant differences across the four IPAs. Our results provide preliminary quantitative evidence of differences in physiological responses when users encounter IPAs, revealing the necessity to inspect the signals separately according to the IPAs. The next step of this study moves into a specific context, information retrieval, and the IPAs are considered as the interaction modalities with the search system, for instance, submitting the search query by speaking or typing.
Helpful, Misleading or Confusing: How Humans Perceive Fundamental Building Blocks of Artificial Intelligence Explanations
Explainable artificial intelligence techniques are developed at breakneck speed, but suitable evaluation approaches lag behind. With explainers becoming increasingly complex and a lack of consensus on how to assess their utility, it is challenging to judge the benefit and effectiveness of different explanations. To address this gap, we take a step back from sophisticated predictive algorithms and instead look into explainability of simple decision-making models. In this setting, we aim to assess how people perceive comprehensibility of their different representations such as mathematical formulation, graphical representation and textual summarisation (of varying complexity and scope). This allows us to capture how diverse stakeholders – engineers, researchers, consumers, regulators and the like – judge intelligibility of fundamental concepts that more elaborate artificial intelligence explanations are built from. This position paper charts our approach to establishing appropriate evaluation methodology as well as a conceptual and practical framework to facilitate setting up and executing relevant user studies.
Towards Measuring Sensitivity of Psychometrics in Crowdsourcing Tasks: Engaging with Fact-checked Content Online
Stanislaus Krisna, Danula Hettiachchi, and Damiano Spina
In Proceedings of the CSCW 2023 Workshop on Undertanding and Mitigating Cognitive Biases in Human-AI Collaboration (CSCW ’23 Workshop), 2023
This study attempts to explore the relationships between psychometric test outcomes and how participants rate the truthfulness of online news articles. We seek to be able to help in the identification of misinformation. We attempt to explore the relationship between political orientation and several other psychometrics. We then ask the user to rate social media article as well as fact-checked article before finding the relationship between the variables and the difference in rating of each articles. We faced an issue in aggregating the result of each participant as well as computing the distance in ordered scale (magnitude given values in limited range).
Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task assignment that dynamically adjusts crowd workflow parameters. In this survey, we review task assignment methods that address: heterogeneous task assignment, question assignment, and plurality problems in crowdsourcing. We discuss and contrast how these methods estimate worker performance, and highlight potential challenges in their implementation. Finally, we discuss future research directions for task assignment methods, and how crowdsourcing platforms and other stakeholders can benefit from them.
Does a Face Mask Protect My Privacy?: Deep Learning to Predict Protected Attributes from Masked Face Images
Contactless and efficient systems are implemented rapidly to advocate preventive methods in the fight against the COVID-19 pandemic. Despite the positive benefits of such systems, there is potential for exploitation by invading user privacy. In this work, we analyse the privacy invasiveness of face biometric systems by predicting privacy-sensitive soft-biometrics using masked face images. We train and apply a CNN based on the ResNet-50 architecture with 20,003 synthetic masked images and measure the privacy invasiveness. Despite the popular belief of the privacy benefits of wearing a mask among people, we show that there is no significant difference to privacy invasiveness when a mask is worn. In our experiments we were able to accurately predict sex (94.7%), race (83.1%) and age (MAE 6.21 and RMSE 8.33) from masked face images. Our proposed approach can serve as a baseline utility to evaluate the privacy-invasiveness of artificial intelligence systems that make use of privacy-sensitive information. We open-source all contributions for reproducibility and broader use by the research community.
REGROW: Reimagining Global Crowdsourcing for Better Human-AI Collaboration
Crowdworkers silently enable much of today’s AI-based products, with several online platforms offering a myriad of data labelling and content moderation tasks through convenient labour marketplaces. The HCI community has been increasingly interested in investigating the worker-centric issues inherent in the current model and seeking for potential improvements that could be implemented in the future. This workshop explores how a reimagined perspective on crowdsourcing platforms could provide a more equitable, fair, and rewarding experience. This includes not only the workers but also the platforms, who could benefit e.g. from better processes for worker onboarding, skills-development, and growth. We invite visionary takes in various formats on this topic to spread awareness of worker-centric research and developments to the CHI community. As a result of interactive ideation work in the workshop, we articulate a future direction roadmap for research centred around crowdsourcing platforms. Finally, as a specific interest area, the workshop seeks to study crowdwork from the context of the Global South, which has been arising as an important but critically understudied crowdsourcing market in recent years.
2021
The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help
We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such tasks, some items require far more effort than others to annotate. Furthermore, the per-item annotation effort is not known until after each item is annotated since determining the number of labels required is an implicit part of the annotation task itself. On an image bounding-box task with crowdsourced annotators, we show that annotator accuracy and recall consistently drop as effort increases. We hypothesize reasons for this drop and investigate a set of approaches to counteract it. Firstly, we benchmark on this task a set of general best-practice methods for quality crowdsourcing. Notably, only one of these methods actually improves quality: the use of visible gold questions that provide periodic feedback to workers on their accuracy as they work. Given these promising results, we then investigate and evaluate variants of the visible gold approach, yielding further improvement. Final results show a 7% improvement in bounding-box accuracy over the baseline. We discuss the generality of the visible gold approach and promising directions for future research.
Effect of Conformity on Perceived Trustworthiness of News in Social Media
A catalyst for the spread of fake news is the existence of comments that users make in support of, or against, such articles. In this article, we investigate whether critical and supportive comments can induce conformity in how readers perceive trustworthiness of news articles and respond to them. We find that individuals tend to conform to the majority’s opinion of an article’s trustworthiness (58%), especially when challenged by larger majorities who are critical of the article’s credibility, or when less confident about their personal judgment. Moreover, we find that individuals who conform are more inclined to take action: to report articles they perceive as fake, and to comment on and share articles they perceive as real. We conclude with a discussion on the implications of our findings for mitigating the dispersion of fake news on social media.
Team Dynamics in Hospital Workflows: An Exploratory Study of a Smartphone Task Manager
BACKGROUND: Although convenient and reliable modern messaging apps like WhatsApp enable efficient communication among hospital staff, hospitals are now pivoting toward purpose-built structured communication apps for various reasons, including security and privacy concerns. However, there is limited understanding of how we can examine and improve hospital workflows using the data collected through such apps as an alternative to costly and challenging research methods like ethnography and patient record analysis. OBJECTIVE: We seek to identify whether the structure of the collected communication data provides insights into hospitals’ workflows. Our analysis also aims to identify ways in which task management platforms can be improved and designed to better support clinical workflows. METHODS: We present an exploratory analysis of clinical task records collected over 22 months through a smartphone app that enables structured communication between staff to manage and execute clinical workflows. We collected over 300,000 task records between July 2018 and May 2020 completed by staff members including doctors, nurses, and pharmacists across all wards in an Australian hospital. RESULTS: We show that important insights into how teams function in a clinical setting can be readily drawn from task assignment data. Our analysis indicates that predefined labels such as urgency and task type are important and impact how tasks are accepted and completed. Our results show that both task sent-to-accepted (P<.001) and sent-to-completed (P<.001) times are significantly higher for routine tasks when compared to urgent tasks. We also show how task acceptance varies across teams and roles and that internal tasks are more efficiently managed than external tasks, possibly due to increased trust among team members. For example, task sent-to-accepted time (minutes) is significantly higher (P<.001) for external assignments (mean 22.10, SD 91.45) when compared to internal assignments (mean 19.03, SD 82.66). CONCLUSIONS: Smartphone-based task assignment apps can provide unique insights into team dynamics in clinical settings. These insights can be used to further improve how well these systems support clinical work and staff.
Investigating and Mitigating Biases in Crowdsourced Data
Spatial experience, or how humans experience a given space, has been a pivotal topic especially in urban-scale environments. On the human scale, HCI researchers have mostly investigated personal meanings or aesthetic and embodied experiences. In this paper, we investigate the human scale as an ensemble of individual spatial features. Through large-scale online questionnaires we first collected a rich set of spatial features that people generally use to characterize their surroundings. Second, we conducted a set of field interviews to develop a more nuanced understanding of the feature identified as most important: perceived safety. Our combined quantitative and qualitative analysis contributes to spatial understanding as a form of context information and presents a timely investigation into the perceived safety of human scale spaces. By connecting our results to the broader scientific literature, we contribute to the field of HCI spatial understanding.
2020
“Hi! I am the Crowd Tasker” Crowdsourcing through Digital Voice Assistants
Inspired by the increasing prevalence of digital voice assistants, we demonstrate the feasibility of using voice interfaces to deploy and complete crowd tasks. We have developed Crowd Tasker, a novel system that delivers crowd tasks through a digital voice assistant. In a lab study, we validate our proof-of-concept and show that crowd task performance through a voice assistant is comparable to that of a web interface for voice-compatible and voice-based crowd tasks for native English speakers. We also report on a field study where participants used our system in their homes. We find that crowdsourcing through voice can provide greater flexibility to crowd workers by allowing them to work in brief sessions, enabling multi-tasking, and reducing the time and effort required to initiate tasks. We conclude by proposing a set of design guidelines for the creation of crowd tasks for voice and the development of future voice-based crowdsourcing systems.
CrowdCog: A Cognitive Skill based System for Heterogeneous Task Assignment and Recommendation in Crowdsourcing
While crowd workers typically complete a variety of tasks in crowdsourcing platforms, there is no widely accepted method to successfully match workers to different types of tasks. Researchers have considered using worker demographics, behavioural traces, and prior task completion records to optimise task assignment. However, optimum task assignment remains a challenging research problem due to limitations of proposed approaches, which in turn can have a significant impact on the future of crowdsourcing. We present ’CrowdCog’, an online dynamic system that performs both task assignment and task recommendations, by relying on fast-paced online cognitive tests to estimate worker performance across a variety of tasks. Our work extends prior work that highlights the effect of workers’ cognitive ability on crowdsourcing task performance. Our study, deployed on Amazon Mechanical Turk, involved 574 workers and 983 HITs that span across four typical crowd tasks (Classification, Counting, Transcription, and Sentiment Analysis). Our results show that both our assignment method and recommendation method result in a significant performance increase (5% to 20%) as compared to a generic or random task assignment. Our findings pave the way for the use of quick cognitive tests to provide robust recommendations and assignments to crowd workers.
How Context Influences Cross-Device Task Acceptance in Crowd Work
Although crowd work is typically completed through desktop or laptop computers by workers at their home, literature has shown that crowdsourcing is feasible through a wide array of computing devices, including smartphones and digital voice assistants. An integrated crowdsourcing platform that operates across multiple devices could provide greater flexibility to workers, but there is little understanding of crowd workers’ perceptions on uptaking crowd tasks across multiple contexts through such devices. Using a crowdsourcing survey task, we investigate workers’ willingness to accept different types of crowd tasks presented on three device types in different scenarios of varying location, time and social context. Through analysis of over 25,000 responses received from 329 crowd workers on Amazon Mechanical Turk, we show that when tasks are presented on different devices, the task acceptance rate is 80.5% on personal computers, 77.3% on smartphones and 70.7% on digital voice assistants. Our results also show how different contextual factors such as location, social context and time influence workers decision to accept a task on a given device. Our findings provide important insights towards the development of effective task assignment mechanisms for cross-device crowd platforms.
2019
Effect of Cognitive Abilities on Crowdsourcing Task Performance
Matching crowd workers to suitable tasks is highly desirable as it can enhance task performance, reduce the cost for requesters, and increase worker satisfaction. In this paper, we propose a method that considers workers’ cognitive ability to predict their suitability for a wide range of crowdsourcing tasks. We measure cognitive ability via fast-paced online cognitive tests with a combined average duration of 6.2 min. We then demonstrate that our proposed method can effectively assign or recommend workers to five different popular crowd tasks: Classification, Counting, Proofreading, Sentiment Analysis, and Transcription. Using our approach we demonstrate a significant improvement in the expected overall task accuracy. While previous methods require access to worker history or demographics, our work offers a quick and accurate way to determine which workers are more suitable for which tasks.
Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study
The increased reliance on algorithmic decision-making in socially impactful processes has intensified the calls for algorithms that are unbiased and procedurally fair. Identifying fair predictors is an essential step in the construction of equitable algorithms, but the lack of ground-truth in fair predictor selection makes this a challenging task. In our study, we recruit 90 crowdworkers to judge the inclusion of various predictors for recidivism. We divide participants across three conditions with varying group composition. Our results show that participants were able to make informed decisions on predictor selection. We find that agreement with the majority vote is higher when participants are part of a more diverse group. The presented workflow, which provides a scalable and practical approach to reach a diverse audience, allows researchers to capture participants’ perceptions of fairness in private while simultaneously allowing for structured participant discussion.
Measuring the Effects of Stress on Mobile Interaction
Research shows that environmental factors such as ambient noise and cold ambience can render users situationally impaired, adversely affecting interaction with mobile devices. However, an internal factor which is known to negatively impact cognitive abilities – stress – has not been systematically investigated in terms of its impact on mobile interaction. In this paper, we report a study where we use the Trier Social Stress Test to induce stress on participants, and investigate its effect on three aspects of mobile interaction: target acquisition, visual search, and text entry. We find that stress reduces completion time and accuracy during target acquisition tasks, as well as completion time during visual search tasks. Finally, we are able to directly contrast the magnitude of these effects to previously published effects of environmentally-caused impairments. Our work contributes to the growing body of literature on situational impairments.
Towards Effective Crowd-Powered Online Content Moderation
Content moderation is an important element of social computing systems that facilitates positive social interaction in online platforms. Current solutions for moderation including human moderation via commercial teams are not effective and have failed to meet the demands of growing volumes of online user generated content. Through a study where we ask crowd workers to moderate tweets, we demonstrate that crowdsourcing is a promising solution for content moderation. We also report a strong relationship between the sentiment of a tweet and its appropriateness to appear in public media. Our analysis on worker responses further reveals several key factors that affect the judgement of crowd moderators when deciding on the suitability of text content. Our findings contribute towards the development of future robust moderation systems that utilise crowdsourcing.