Crowd Sourced Research Projects

There are many “citizen science” initiatives, and many of them are variations on crowd sourcing. One prominent example, Zooniverse, is a veritable cottage industry creating one crowd sourced project after another. These projects employ ordinary people, AKA “citizens”, in real scientific research.

These collaborations can be very effective, magnifying the efforts of our few remaining professional scientists and research dollars. Unfortunately, in most cases, the civilians are employed in routine, low skill roles. In the case of Zooniverse, the projects are almost exclusively visual (or aural) recognition tasks, asking people to look for significant objects in visual (or sound) data. These internet volunteers occupy the ecological niche that we used to pay students to fill, back when we had money for scientific research.

Is it possible to have more people participate in science in more interesting ways?


In the last couple of years, a Stanford-Santa Cruz project has deployed digital collaboration tools to create “Crowd Research: Open and Scalable University Laboratories,” [1] The idea is to involve volunteers from around the globe in the full array of research activities, including decision making, problem solving, and professional publishing.

Most important of all, the projects were not reduced to “Mechanical Turk” microtasks, but functioned more like actual science labs. The projects were organized akin to conventional university research, directed by a professional Principle Investigator, with institutional techincal support. The participants were recruited through open calls, and invited to study, investigate, propose, and critique the research problems.

The Crowd Research project uses techniques and tools familiar from virtual organizations and collaborative on-line work. Each project developed milestones, which were reviewed in periodic (weekly) meetings. These tasks might involve many familiar research activities, including reading papers, interviewing informants, generating ideas, or prototyping.

The large number of responses are peer evaluated to select a handful to discuss in the video conference. This process is essentially the same as reddit-style upvoting. It is interesting that “randomized a double-blind assignment, anonymous feedback was needlessly negative and evaluative” ([1], p. 834), so they use completely public reviews.

A small group of participants connects to the live video discussion, others can participate through digital comments and anyone can view the archived meeting. The weekly meeting discusses the top submissions, and decides what to do next. The PI may assign reading or other training activities. In some cases, an individual may be designated to lead execution of a particular milestone, e.g., when multiple efforts need to be coordinated.

I note that participating in the video conference is a “prize” for submitting a high rated response to the milestone. This converts the mandatory, “oh, no, not another meeting” situation, into a sought-after opportunity to meet the PI and top colleagues. I.e., this is an improvement over many collaborations, virtual or physical.

The project results are written up to meet profession style and standards. The contributions of individuals are visible in the digital collaborations, so the paper can assign credit as due. This is a significant opportunity for the participants to achieve visible academic credentials that usually are only garnered by students at elite schools.

The Crowd Research project created a decentralized system to assign credit to the contributions of each person. This helps the PI write letters of recommendation, even when the research group is too large and distributed to know every individual.

The Crowd Research Initiative has evaluated these techniques in a metastudy [1]. The digital infrastructure makes it possible to not only track participation, but also who did what. They document that most of the final ideas originated from “the crowd”, and most of the writing also was done by the crowd. It is important to note that this is about the same as a university lab, except the participants are not limited to selected enrolled students.

While there was little formal screening of participants, there was high attrition that filtered out the majority of initial sign ups. Many were not able to commit enough time due to other commitments, though there are also indications that some lost interest in the work as it developed.

The researchers document the relatively democratic spread of access and benefit from the experience. With publications and letters from PIs, many students gained admission to programs of study that they otherwise would not have.

The reputation system was correlated with the assignment of authorship and acknowledgement on the publications. Their algorithm (similar to PageRank) tended to reflect concrete contributions (such as checkins), though it was still possible to game the system to increase personal credit.

In their recent paper, they draw conclusions about “How to run a bad Crowd Research project” ([1], p. 838). They note the need to expect drop outs and conflict, and suggest that the project be carefully selected to match the strengths of the format. Also, as noted, they don’t recommend a competitive vibe.


This is an interesting and somewhat heroic project, harking back to the good old days when university researchers were generously supported and could tackle ambitious projects involving dozens of students.

One very important point to emphasize is that these projects were much more like “regular research”, and absolutely not the usual trivial crowd sourcing tasks. I would also say that they strongly resemble many software projects, and also collaborative non-profit projects (e.g., organizing a community workshop). I think this is not a coincidence, in that these virtual collaborations are similar social groups. As such, the lessons of Crowd Research probably should apply well to other digitally enhanced collaborations.

There are a couple of important caveats about this approach.

First, as they intimate in their anti-patterns, not every research topic or project is a good match to crowd research (or digital collaboration). A good project should “leverage scale and diversity to achieve more ambitious goals” (p. 838) I would also say that the project needs to have primarily digital deliverables. Obviously, it would be difficult to coordinate and share a single physical prototype or materials, with any digital technology.

Second, the high satisfaction of the participants, professional and non-professional, has to be taken with a grain of salt. In particular, the participants were self-selected at the beginning, and through attrition. Crowd Research is well designed to create a sense of commitment and ownership in the project, at least in those who persist. However, it isn’t possible to extrapolate these results to people in general.

Even in these experiments, more than half of the initial recruits dropped out. Whatever the reason for leaving (generally, lack of time), these drop outs did not benefit and could not have a very high satisfaction with the experience. This was a great experience for a tiny, select group of people. The successful participants were highly motivated, and had skill and interest matches. This is a natural feature of collaborative research, and crowd technology neither can or should change that.

A third point to consider is that these young (mostly undergraduate students) were surely digital natives, quite used to social media and communication media such as reddit and reputation systems. This study showed that these technologies can be used effectively, at least for a self-selected group who are proficient and comfortable with these digital interactions.

It isn’t clear how universal this sort of digital literacy may be, or whether there are different styles. The study had to deal with cultural and personal conflict, but it could only deal with them within the digital arena. People who could not or would not play the game were simple not in the sample.

Obviously, technical and language limitations could preclude effective participation. In addition, people with limited vision or motor skills would be at a disadvantage. And, of course, people who lack confidence or are just shy will be hard to get.

These challenges are important issues for all digital life and digital work. Indeed, at its best, Crowd Research is a great approach, because the PI and RAs offer positive and encouraging leadership. My own view is that the attention and leadership of the PI probably spells the difference between the successful CR project and the hundreds of failed digital collaborations. In this, CR is recreating one of the ways that university education succeeds through mentoring and exposure to professional role models.


  1. Rajan Vaish, Snehalkumar S. Gaikwad, Geza Kovacs, Andreas Veit, Ranjay Krishna, Imanol Arrieta Ibarra, Camelia Simoiu, Michael Wilber, Serge Belongie, Sharad Goel, James Davis, and Michael S. Bernstein, Crowd Research: Open and Scalable University Laboratories, in Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 2017, ACM: Québec City, QC, Canada. p. 829-843. http://hci.stanford.edu/publications/2017/crowdresearch/crowd-research-uist2017.pdf

 

2 thoughts on “Crowd Sourced Research Projects”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.