Information

Platforms to make online experiments with counterbalancing

Platforms to make online experiments with counterbalancing

There are a number of different interfaces to make an place experiments online, e.g. in questions Software for online psychological experiments that don't require users to download anything or Open source software for running Internet psychological experiments that collect reaction time data.

We have an experimental design that we'd share online, but we've found that one feature may be quite rare - namely counterbalancing. We have 4 conditions, and we'd like participants to be placed in these conditions so that we have more or less equal amount in each by the end. This can be difficult for web studies since it should only count the participants that successfully completed the trial.

Free and open source would be best, we don't have a designated budget to pay for setting up an experiment.


You could try www.gorilla.sc It's cloud based software specifically designed for running cognitive psychology experiments online without needing to code. It's not free, but it is affordable - £0.75 per participant.

Counterbalancing of various sorts is built in; randomisers, stimuli set counterbalancers, order counterbalancers. It sounds like the simple randomiser set to balanced will do what you want. As long as you stay on top of rejecting Ns that drop out, you'll get a balance in each condition.

There is a GUI questionnaire builder, task builder and experiment tree builder. So you don't need to do any coding. The vast majority of cognitive science task can be created in it very easily.

Full disclosure, I created Gorilla.


Qualtrics (survey software) and Inquisit (more sophisticated stimulus presentation software) both run online. Both tools support counterbalancing. Inquisit has extensive support for both randomisation of stimuli in a variety of ways and between-subjects allocations to orders and so on. They both require payment, but some universities have subscriptions to one or both services (especially Qualtrics).

I'm not sure about how you'd solve the issue of drop out to get perfectly even numbers in each group ordering.


4 thoughts on &ldquo Conducting Online Experiments – The How To Guide &rdquo

I’m currently looking at running memory experiments online, and javascript + qualtrics looks like a go-er. Just wondered how you integrated your javascript into Qualtrics—in particular, how have you been saving your data in a way that you can access it later. The way I have things set up now means that the raw responses from all trials for a participant end up in a single csv field when I download from qualtrics!

I have only ever run experiments using javascript or qualtrics and never together although that does seem like a great option! Unfortunately that means that I am not sure what might be the best way to integrate the two. Good luck!

Thanks a lot for the advcies, I’ll be definitely using them since I’m looking at running online experiments as well.

Existing platforms like Qualtrics are not flexible enough for the experiments I have in mind, so I started educating myself on HTTP, CSS and JavaScript. However, these are general purpose resources, not really tailored for running experiments. It would be great if you could point out some resources that would speed up learning – like GitHub accounts with some examples of experiments together with the code etc.

https://github.com/EoinTravers/PsychScript is a very useful resource with scripts and libraries for running online psychology experiments.

Otherwise, if you have firefox and install firebug you can view the code for any experiment online. Looking at how others programmed their experiments can be a great way to get a feel for how to do it yourself although it can be tricky trying to read others’ script. Firebug is also great for troubleshooting when you write your own script.


Many libraries and museums have taken their special collections such as rare books, manuscripts, photographs, pamphlets, news clippings, musical scores and more and have digitized them to create collections of digital assets that can be displayed online through a digital exhibition. Digital exhibits such as these offer unprecedented access to organizational treasures that might never be seen otherwise except by those with local physical access to the museum or library. A new breed of open-source and free software tools has recently emerged making it possible to catalog and manage digital collections and create robust narratives and layouts for display online.

These are the main software applications which are used by libraries and museums to create digital exhibits and for digital asset management. The industry leader in this space is a proprietary application called Contentdm (http://www.contentdm.org/) created by OCLC. Contenddm is a digital collection management software that allows for the upload, description, management and access of digital collections. This application offers robust cataloging features and an easy-to-use interface but is cost-prohibitive for many non-profit organizations. Entry level CONTENTdm options start at $4,300 annually with mid-size Licenses that start at a $10,000 one-time fee with ongoing annual maintenance starting at $2,000.

A Contentdm Digital Collection

Free and Open Source Tools

However, there are many free and open source alternatives to Contentdm for creating online interactive digital exhibits.

Omeka
http://omeka.org/
Omeka is a free, open source web publishing system for online digital archives. Its main focus/strength is producing websites and online exhibitions. Both the Web interface and back end cataloging system are one unified application. Users can build attractive websites and exhibits using templates and page layouts, without having to adjust code, although more robust displays can be created by customizing the CSS and HTML files, and moving around some PHP snippets. Omeka has a plugin available for OAI support to make collections harvestable by major search engines. Although Omeka is a bit more limited than some other applications such as Collective Access (see below) in terms of cataloging & metadata capabilities, it allows fast/easy creation of online exhibits through a Web interface, a low learning curve, many plugins with added functionality, and a large developer community.

Metadata Supported: Omeka uses Dublin Core and MODS metadata, and offers customizable item type cataloging. There are many templates and plugins which offer added functionality such as displaying items on Google Maps, providing LCSH for cataloging

Hosted Version and/or Downloadable Code Available? Omeka offers both a hosted, Web-based version or the downloadable application which can be installed and hosted on-site by the organization.

Recommended for: Libraries, Museums

Collective Access
http://collectiveaccess.org/
Collective Access is a free, open source cataloging tool and web-based application for museums, archives and digital collections. Its main focus/strength is on cataloging and metadata. You can create very robust cataloging records, create relationships between items, create profiles of creators and subjects of items and link them to objects, etc. Collective Access offers multiple metadata schemas. The Web component, called Pawtucket, is a separate installation, and necessitates editing php files in order to build/adjust websites. A front-end PHP programmer would be necessary with this solution, and quite possibly one to set up the back-end templates as well.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: DublinCore, VRA, CDWA/CCO, MARC (planned), others, plus the ability to create in house standards and to customize existing standards. Ability to access external data sources and services such as LCSH, Getty Art & Architecture Thesaurus, and GoogleMaps, GoogleEarth or GeoNames for geospatial cataloguing.

Recommended for: Libraries, Museums

CollectionSpace
http://www.collectionspace.org/
ColectionSpace is a free, open-source collections management application for museums, libraries, historical societies, and other organizations with special collections. The application is administered by Museum of the Moving Image, but it's a joint partners with the division of Information Services and Technology at the University of California, Berkeley and the Centre for Applied Research in Educational Technologies at the University of Cambridge. The software is made up of a suite of modules and services for managing your collections of digital assets, however it doesn't have any native ability to create digital exhibits. Instead, it enables users to connect with other open-source applications already in use by the cultural sector for online exhibition creation. The application allows for the creation of a customized controlled vocabulary for describing collections.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: CollectionSpace supports multiple metadata schemas including DublinCore and customized schemas.

Recommended for: Libraries, Museums

Open Exhibits
http://openexhibits.org/
Open Exhibits is a multitouch, multi-user tool kit that allows you to create custom interactive exhibits. The strength of this application has less to do with cataloging collections of digital assets, but developing online and interactive exhibits with digital objects. The multi-touch piece comes into play with the ability to specify that certain types of user behaviors will result in various outcomes, e.g. if a user drags a certain section of an image, the entire image will move and readjust along with the movement. Users without technical expertise can work with pre-existing templates and modules, while developers can create their own with the SDK kit. The application uses a combination of its own markup languages &ndash Creative Mark-up Language (CML) and Gesture Mark-up Language (GML) along with CSS libraries.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: Not applicable.

Recommended for: Museums

Pachyderm
http://pachyderm.nmc.org
Pachyderm is a free, open-source and easy-to-use multimedia authoring tool created by the New Media Consortium (NMC). It's been designed for people with little technology or multimedia experience and involves little more than filling out a web form. Authors place their digital assets (images, audio clips, and short video segments) into pre-designed templates, which can play video and audio, link to other templates, zoom in on images, and more. Completed templates result in interactive, Flash-based presentations that can include images, sounds, video, and text that can be downloaded and displayed on websites or can be kept on the Pachyderm server and linked directly from there.

Hosted Version and/or Downloadable Code Available? The NMC has stated that they are no longer offering hosted accounts at this time so the application must be downloaded and hosted by the organization or individual.


Introduction

Behavioral research and experimental psychology are increasing their use of web browsers and the internet to reach larger (Adjerid & Kelley, 2018) and more diverse (Casler, Bickel, & Hackett, 2013) populations than has previously been feasible with lab-based methods. However, unique variables are introduced when working within an online environment. The experience of the user is the result of a large number of connected technologies, including the server (which hosts the experiment), the internet service provider (which delivers the data), the browser (which presents the experiment to the participant and measures their responses), and the content itself—which is determined by a mixture of media (e.g., audio/pictures/video) and code in different programming languages (e.g., JavaScript, HTML, CSS, PHP, Java). Linking these technologies is technically difficult, time-consuming, and costly. Consequently, until recently, online research was generally carried out—and scrutinized—by those with the resources to overcome these barriers.

The purpose of this article is threefold: first, to explore the problems inherent to running behavioral experiments online with web programming languages, the issues this can create for timing accuracy, and recent improvements that can mitigate these issues second, to introduce Gorilla, an online experiment builder that uses best practices to overcome these timing issues and makes reliable online experimentation accessible and transparent to the majority of researchers third, to demonstrate the timing accuracy and reliability provided by Gorilla. We achieved this last goal using data from a flanker task—which requires high timing fidelity—collected from a wide range of participants, settings, equipment, and internet connection types.

JavaScript

The primary consideration for online experimenters in the present time is JavaScript, the language that is most commonly used to generate dynamic content on the web (such as an experiment). Its quirks (which are discussed later) can lead to problems with presentation time, and understanding it forms a large part of an access barrier.

JavaScript is at the more dynamic end of the programming language spectrum. It is weakly typed and allows core functionality to be easily modified. Weak typing means that variables do not have declared types the user simply declares a variable and then uses it in their code. This is in contrast to strongly typed languages, in which the user must specify whether a variable they declare should be an integer, a string, or some other structure. This can lead to unnoticed idiosyncrasies—if a user writes code that attempts to divide a string by a number, or assign a number to a variable that was previously assigned to an array, JavaScript allows this to proceed. Similarly, JavaScript allows users to call functions without providing all the arguments to that function. This dynamic nature gives more flexibility, but at the cost of allowing mistakes or unintended consequences to creep in. By contrast, in a strongly typed language, incorrect assignments or missing function arguments would be marked as errors that the user should correct. This results in a more brittle, but safer, editing environment. JavaScript also allows a rare degree of modification of core structures—even the most fundamental building blocks (such as arrays) can have extra methods added to them. This can prove useful in some cases, but can easily create confusion as to which parts of the code are built-in and which parts are user defined. Together, these various factors create a programming environment that is very flexible, but one in which mistakes are easy to make and their consequences can go undetected by the designer (Richards, Lebresne, Burg, & Vitek, 2010). This is clearly not ideal for new users attempting to create controlled scientific experiments. Below we discuss two significant hurdles when building web experiments: inaccuracies in the timing of various experiment components in the browser, and the technical complexities involved in implementing an online study, including JavaScript’s contributions. These complexities present an access barrier to controlled online experiments for the average behavioral researcher.

History of timing concerns

Timing concerns have been expressed regarding online studies (for an overview, see Woods, Velasco, Levitan, Wan, & Spence, 2015), and although many of these concerns are now historic for informed users—because solutions exist—they are still an issue for new users who may not be aware of them. These concerns can be divided into the timing of stimuli—that is, an image or sound is not presented for the duration you want—and the timing of response recording—that is, the participant did not press a button at the time they are recorded doing so. These inaccuracies have obvious implications for behavioral research, especially those using time-based measures such as reaction time (RT).

Several things might be driving these timing issues: First, in JavaScript programs, most processes within a single web-app or browser window pass through an event loop Footnote 1 —a single thread that decides what parts of the JavaScript code to run, and when. This loop comprises different types of queues. Queues that are managed synchronously wait until one task is complete before moving on. One example of a synchronously managed queue is the event queue, which stores an ordered list of things waiting to be run. Queues that are managed asynchronously will start new tasks instead of waiting for the preceding tasks to finish, such as the queue that manages loading resources (e.g., images). Most presentation changes are processed through the event loop in an asynchronous queue. This could be an animation frame updating, an image being rendered, or an object being dragged around. Variance in the order in which computations are in the queue, due to any experiment’s code competing with other code, can lead to inconsistent timing. When a synchronous call to the event loop requires a lot of time, it can “block” the loop—preventing everything else in the queue from passing through. For instance, you may try and present auditory and visual stimuli at the same time, but they could end up out of synchronization if blocking occurs—a common manifestation of this in web videos is unsynchronized audio and video.

Second, the computational load on the current browser window will slow the event loop down variance in timing is, therefore, dependent on different computers, browsers, and computational loads (Jia, Guo, Wang, & Zhang, 2018). For a best-practices overview, see Garaizar and Reips (2018). Given the need for online research to make use of onsite computers such as those in homes or schools, the potential variance mentioned above is an important issue. A laptop with a single processor, a small amount of memory, and an out-of-date web browser is likely to struggle to present stimuli to the same accuracy as a multicore desktop with the most recent version of Google Chrome installed. These variances can represent variance of over 100 ms in presentation timing (Reimers & Stewart, 2016).

Third, by default, web browsers load external resources (such as images or videos) progressively as soon as the HTML elements that use them are added to the page. This results in the familiar effect of images “popping in” as the page loads incrementally. If each trial in an online task is treated as a normal web page, this “popping in” will lead to inaccurate timing. Clearly, such a variance in display times would be unsuitable for online research, but the effect can be mitigated by loading resources in advance. A direct solution is to simply load all the required resources, for all the trials, in advance of starting the task (Garaizar & Reips, 2018). This can be adequate for shorter tasks or tasks that use a small number of stimuli, but as the loading time increases, participants can become more likely to drop out, resulting in an increase in attrition.

The same concerns (with the exception of connection speed) can be applied to the recording of RTs, which are dependent on a JavaScript system called the “event system.” When a participant presses a mouse or keyboard button, recording of these responses (often through a piece of code called an “Event Listener”) gets added to the event loop. To give a concrete example, two computers could record different times of an identical mouse response based on their individual processing loads. It must be noted that this issue is independent of the browser receiving an event (such as a mouse click being polled by the operating system), for which there is a relatively fixed delay, which has been shown to be equivalent in nonbrowser software (de Leeuw & Motz, 2016)—this receiving delay is discussed later in the article. Timing of event recording using the browser system clock (which some JavaScript functions do) is also another source of variance—because different machines and operating systems will have different clock accuracies and update rates.

Current state of the art

Presently, the improved processing capabilities in common browsers and computers, in concert with improvements in web-language standards—such as HTML5 and ECMAScript 6—offer the potential to overcome some concerns about presentation and response timings (Garaizar, Vadillo, & López-de Ipiña, 2012, 2014 Reimers & Stewart, 2015, 2016 Schmidt, 2001). This is because, in addition to standardized libraries (which improve the consistency of any potential web experiment between devices), these technologies use much more efficient interpreters, which are the elements of the browser that execute the code and implements computations. An example of this is Google’s V8, which improves processing speed—and therefore the speed of the event loop—significantly (Severance, 2012). In fact, several researchers have provided evidence that response times are comparable between browser-based applications and local applications (Barnhoorn, Haasnoot, Bocanegra, & van Steenbergen, 2015), even in poorly standardized domestic environments—that is, at home (Miller, Schmidt, Kirschbaum, & Enge, 2018).

A secondary benefit of recent browser improvements is scalability. If behavioral research continues to take advantage of the capacity for big data provided by the internet, it needs to produce scalable methods of data collection. Browsers are becoming more and more consistent in the technology they adopt—meaning that code will be interpreted more consistently across your experimental participants. At the time of writing, the standard for browser-based web apps is HTML5 (the World Wide Web Consortium, 2019, provides the current web standards) and the ECMAScript JavaScript (Zaytsev, 2019, shows that most browsers currently support ECMAScript 5 and above). ECMAScript (ES) is a set of standards that are implemented in JavaScript (but, can also be implemented in other environments—e.g., ActionScript in Flash), and browsers currently support a number of versions of this standard (see Zaytsev, 2019, for details). The combination of ES and HTML5, in addition to having improved timing, is also the most scalable. They reach the greatest number of users—with most browsers supporting them, which is in contrast with other technologies, such as Java plugins and Flash that are becoming inconsistently supported—in fact, Flash support has recently begun a departure from all major browsers.

Access barriers

Often, to gain accurate timing and presentation, you must have a good understanding of key browser technologies. As in any application in computer science, there are multiple methods for achieving the same goal, and these may vary in the quality and reliability of the data they produce. One of the key resources for tutorials on web-based apps—the web itself—may lead users to use out-of-date or unsupported methods with the fast-changing and exponentially expanding browser ecosystem, this is a problem for the average behavioral researcher (Ferdman, Minkov, Bekkerman, & Gefen, 2017). This level of complexity imposes an access barrier to creating a reliable web experiment—the researcher must have an understanding of the web ecosystem they operate in and know how to navigate its problems with appropriate tools.

However, tools are available that lower these barriers in various ways. Libraries, such as jsPsych (de Leeuw, 2015), give a toolbox of JavaScript commands that are implemented at a higher level of abstraction—therefore relieving the user of some implementation-level JavaScript knowledge. Hosting tools such as “Just Another Tool for Online Studies” (JATOS) allow users to host JavaScript and HTML studies (Lange, Kühn, & Filevich, 2015) and present the studies to their participants—this enables a research-specific server to be set up. However, with JATOS you still need to know how to set it up and manage your server, which requires a considerable level of technical knowledge. The user will also need to consider putting safeguards in place to manage unexpected server downtime caused by a whole range of issues. This may require setting up a back-up system or back-up server. A common issue is too many participants accessing the server at the same time, which can cause it to overload and likely prevent access to current users midexperiment—which can lead to data loss (Schmidt, 2000).

The solutions above function as “packaged software,” in which the user is responsible for all levels of implementation (i.e., browser, networking, hosting, data processing, legal compliance, regulatory compliance and insurance)—in the behavioral research use-case, this requires multiple tools to be stitched together (e.g., jsPsych in the browser and JATOS for hosting). This itself presents another access barrier, as the user then must understand—to some extent—details of the web server (e.g., how many concurrent connections their hosted experiment will be able to take), hosting (the download/upload speeds), the database (where and how data will be stored e.g., in JavaScript object notation format, or in a relational database), and how the participants are accessing their experiment and how they are connected (e.g., through Prolific.ac or Mechanical Turk).

One way to lower these barriers is to provide a platform to manage all of this for the user, commonly known as software as a service (SaaS Turner, Budgen, & Brereton, 2003). All of the above can be set up, monitored, and updated for the experimenter, while also providing as consistent and reproducible an environment as possible—something that is often a concern for web research. One recent example is the online implementation of PsyToolkit (Stoet, 2017), through which users can create, host, and run experiments on a managed web server and interface however, there is still a requirement to write out the experiment in code, which represents another access limitation.

Some other tools exist in the space between SaaS and packaged software. PsychoPy3 (Peirce & MacAskill, 2018) is an open-source local application offering a graphical task builder and a Python programming library. It offers the ability to export experiments built in the task builder (but currently not those built using their Python library) to JavaScript, and then to a closed-source web platform based on GitLab (an repository -based b version control system) called Pavlovia.org, where users can host that particular task for data collection. Lab.js (Henninger, Mertens, Shevchenko, & Hilbig, 2017) is another task builder, which provides a web-based GUI, in which users can build a task and download a package containing the HTML, CSS, and JavaScript needed to run a study. Users are then able to export this for hosting on their own or on third-party servers. Neither of these tools functions fully as SaaS, since they do not offer a fully integrated platform that allows you to build, host, distribute tasks for, and manage complex experimental designs (e.g., a multiday training study) without programming, in the same environment. A full comparison of packaged software, libraries, and hosting solutions can be found in Table 1.

The Gorilla Experiment Builder

Gorilla (www.gorilla.sc) is an online experiment builder whose aim is to lower the barrier to access, enabling all researchers and students to run online experiments (regardless of programming and networking knowledge). As well as giving greater access to web-based experiments, it reduces the risk of introducing higher noise in data (e.g., due to misuse of browser-based technology). By lowering the barrier, Gorilla aims to make online experiments available and transparent at all levels of ability. Currently, experiments have been conducted in Gorilla on a wide variety of topics, including cross-lingual priming (Poort & Rodd, 2017), the provision of lifestyle advice for cancer prevention (Usher-Smith et al., 2018), semantic variables and list memory (Pollock, 2018), narrative engagement (Richardson et al., 2018), trust and reputation in the sharing economy (Zloteanu, Harvey, Tuckett, & Livan, 2018), how individuals’ voice identities are formed (Lavan, Knight, & McGettigan, 2018), and auditory perception with degenerated music and speech (Jasmin, Dick, Holt, & Tierney, 2018). Also, several studies have preregistered reports, including explorations of object size and mental simulation of orientation (Chen, de Koning, & Zwaan, 2018) and the use of face regression models to study social perception (Jones, 2018). Additionally, Gorilla has also been mentioned in an article on the gamification of cognitive tests (Lumsden, Skinner, Coyle, Lawrence, & Munafò, 2017). Gorilla was launched in September 2016, and as of January 2019 over 5,000 users have signed up to Gorilla, across more than 400 academic institutions. In the last three months of 2018, data were collected from over 28,000 participants—an average of around 300 participants per day.

One of the greatest differences between Gorilla and the other tools mentioned above (a comprehensive comparison of these can be found in Table 1) is that it is an experiment design tool, not just a task-building or questionnaire tool. At the core of this is the Experiment Builder, a graphical tool that allows you to creatively reconfigure task and questionnaires into a wide number of different experiment designs without having to code. The interface is built around dragging and dropping nodes (which represent what the participant sees at that point, or modifications to their path through the experiment) and connecting them together with arrow lines. This modular approach makes it much easier for labs to reuse elements that have been created before, by themselves or by others. For instance, this allows any user to construct complex, counterbalanced, randomized, between-subjects designs with multiday delays and email reminders, with absolutely no programming needed. Examples of this can be seen in Table 2.

Gorilla provides researchers with a managed environment in which to design, host, and run experiments. It is fully compliant with the EU General Data Protection Regulation and with NIHR and BPS guidelines, and it has backup communication methods for data in the event of server problems (to avoid data loss). A graphical user interface (GUI) is available for building questionnaires (called the “Questionnaire Builder”), experimental tasks (the “Task Builder”), and running the logic of experiments (“Experiment Builder”). For instance, a series of different attention and memory tasks could be constructed with the Task Builder, and their order of presentation would be controlled with the Experiment Builder. Both are fully implemented within a web browser and are illustrated in Fig. 1. This allows users with little or no programming experience to run online experiments, whilst controlling and monitoring presentation and response timing.

Example of the two main GUI elements of Gorilla. (A) The Task Builder, with a screen selected showing how a trial is laid out. (B) The Experiment Builder, showing a check for the participant, followed by a randomizer node that allocates the participant to one of two conditions, before sending them to a Finish node

At the Experiment Builder level (Fig. 1B), users can create logic for the experiment through its nodes, which manage capabilities such as randomization, counterbalancing, branching, task switching, repeating, and delay functions. This range of functions makes it as easy to create longitudinal studies with complex behavior. An example could be a four-week training study with email reminders, in which participants would receive different tasks based on prior performance, or the experiment tree could just as easily enable a one-shot, between-subjects experiment. Additionally, Gorilla includes a redirect node that allows users to redirect participants to another hosted service and then send them back again. This allows users to use the powerful Experiment Builder functionality (i.e., multiday testing) while using a different service (such as Qualtrics) at the task or questionnaire level. Table 2 provides a more detailed explanation of several example experiments made in the builder.

The Task Builder (Fig. 1A) provides functionality at the task level. Each experimental task is separated into “displays” that are made of sequences of “screens.” Each screen can be configured by the user to contain an element of a trial, be that text, images, videos, audio, buttons, sliders, keyboard responses, progress bars, feedback, or a wide range of other stimuli and response options. See the full list here: https://gorilla.sc/support/articles/features. The content of these areas either can be static (such as instructions text) or can change on a per-trial basis (when the content is set using a spreadsheet). The presentation order of these screens is dependent on sequences defined in this same spreadsheet, in which blocked or complete randomization can take place on the trial level. Additionally, the Task Builder also has a “Script” tab, which allows the user to augment the functionality provided by Gorilla with JavaScript. This allows users to use the GUI and JavaScript side by side. There is also a separate “Code Editor,” which provides a developmental environment to make experiments purely in code. This allows users to include external libraries, such as jsPsych. The purpose of the Code Editor is to provide a secure and reliable service for hosting, data storage, and participant management for tasks written in code.

Using tools like the Code Editor, users can extend the functionality of Gorilla through use of the scripting tools, in which custom JavaScript commands, HTML templates, and an application programming interface (API) are available—an API is a set of functions that gives access to the platform’s functionality in the Code Editor, and also allows users to integrate third-party libraries into their experiments (e.g., tasks programmed in jsPsych). Therefore, Gorilla also can function as a learning platform through which users progress on to programming—while providing an API that manages more complex issues (such as timing and data management) that might cause a beginner to make errors. The Code Editor allows the inclusion of any external libraries (e.g., pixi.js for animation, OpenCV.js for image processing, or WebGazer.js for eyetracking). A full list of features is available at www.gorilla.sc/tools, and a tutorial is included in the supplementary materials.

Timing control

A few techniques are utilized within Gorilla to control timing. To minimize any potential delays due to network speed (mentioned above), the resources from several trials are loaded in advance of presentation, a process called caching. Gorilla loads the assets required for the next few trials, begins the task, and then continues to load assets required for future trials while the participant completes the task. This strikes an optimal balance between ensuring that trials are ready to be displayed when they are reached, while preventing a lengthy load at the beginning of the task. This means that fluctuations in connection speed will not lead to erroneous presentation times. The presentation of stimuli are achieved using the requestAnimationFrame() function, which allows the software to count frames and run code when the screen is about to be refreshed, ensuring that screen-refreshing in the animation loop does not cause hugely inconsistent presentation. This method has previously been implemented to achieve accurate audio presentation (Reimers & Stewart, 2016) and accurate visual presentation (Yung, Cardoso-Leite, Dale, Bavelier, & Green, 2015). Rather than assuming that each frame is going to be presented for 16.667 ms, and presenting a stimulus for the nearest number of frames (something that commonly happens), Gorilla times each frame’s actual duration—using requestAnimationFrame(). The number of frames a stimulus is presented for can, therefore, be adjusted depending on the duration of each frame—so that most of the time a longer frame refresh (due to lag) will not lead to a longer stimulus duration. This method was used in the (now defunct) QRTEngine (Barnhoorn et al., 2015), and to our knowledge is not used in other experiment builders (for a detailed discussion of this particular issue, see the following GitHub issue, www.github.com/jspsych/jsPsych/issues/75, and the following blog post on the QRTEngine’s website, www.qrtengine.com/comparing-qrtengine-and-jspsych/).

RT is measured and presentation time is recorded using the performance.now() function, which is independent of the browser’s system clock, and therefore not impacted by changes to this over time. This is the same method used by QRTEngine, validated using a photodiode (Barnhoorn et al., 2015). Although performance.now() and its associated high-resolution timestamps offer the greatest accuracy, resolution has been reduced intentionally by all major browsers, in order to mitigate certain security threats (Kocher et al., 2018 Schwarz, Maurice, Gruss, & Mangard, 2017). In most browsers, the adjusted resolution is rounded to the nearest 1–5 ms, with 1 ms being the most common value (Mozilla, 2019). This is unlikely to be a permanent change, and will be improved when the vulnerabilities are better understood (Mozilla, 2019 Ritter & Mozilla, 2018).

Additionally, to maximize data quality, the user can restrict through the GUI which devices, browsers, and connection speeds participants will be allowed to have, and all these data are then recorded. This method allows for restriction of the participant’s environment, where only modern browser/device combinations are permitted, so that the above techniques—and timing accuracy—are enforced. The user is able to make their own call, in a trade-off between potential populations of participants and restrictions on them to promote accurate timing, dependent on the particulars of the task or study.

Case study

As a case study, a flanker experiment was chosen to illustrate the platform’s capability for accurate presentation and response timing. To demonstrate Gorilla’s ability to work within varied setups, different participant groups (primary school children and adults in both the UK and France), settings (without supervision, at home, and under supervision, in schools and in public engagement events), equipment (own computers, computer supplied by researcher), and connection types (personal internet connection, mobile phone 3G/4G) were selected.

We ran a simplified flanker task taken from the attentional network task (ANT Fan, McCandliss, Sommer, Raz, & Posner, 2002 Rueda, Posner, & Rothbart, 2004). This task measures attentional skills, following attentional network theory. In the original ANT studies, three attentional networks were characterized: alerting (a global increase in attention, delimited in time but not in space), orienting (the capacity to spatially shift attention to an external cue), and executive control (the resolution of conflicts between different stimuli). For the purpose of this article, and for the sake of simplicity, we will focus on the executive control component. This contrast was chosen because MacLeod et al. (2010) found that it was highly powered and reliable, relative to the other conditions in the ANT. Participants responded as quickly as possible to a central stimulus that was pointing either in the same direction as identical flanking stimuli or in the opposite direction. Thus, there were both congruent (same direction) and incongruent (opposite direction) trials.

Research with this paradigm has robustly shows that RTs to congruent trials are faster than those to incongruent trials—Rueda et al. (2004) have termed this the “conflict network.” This RT difference, although significant, is often less that 100 ms, and thus very accurately timed visual presentation and accurate recording of responses are necessary. Crump, McDonnell, and Gureckis (2013) successfully replicated the results of a similar flanker task online, using Amazon Mechanical Turk, with letters as the targets and flankers, so we know this can be an RT-sensitive task that works online. Crump et al. coded this task in JavaScript and HTML and managed the hosting and data storage themselves however, the present versions of the experiment were created and run entirely using Gorilla’s GUI. We hypothesized that the previously recorded conflict RT difference would be replicated on this platform.


PsiTurk: An open-source framework for conducting replicable behavioral experiments online

Online data collection has begun to revolutionize the behavioral sciences. However, conducting carefully controlled behavioral experiments online introduces a number of new of technical and scientific challenges. The project described in this paper, psiTurk, is an open-source platform which helps researchers develop experiment designs which can be conducted over the Internet. The tool primarily interfaces with Amazon’s Mechanical Turk, a popular crowd-sourcing labor market. This paper describes the basic architecture of the system and introduces new users to the overall goals. psiTurk aims to reduce the technical hurdles for researchers developing online experiments while improving the transparency and collaborative nature of the behavioral sciences.


Across Subjects Counterbalancing

When one set of treatments is given, but many sequences are used overall, it’s called across subjects counterbalancing. Your participants receive all combinations of treatments in different orders. This is to guard against order effects (the possibility that the position of the treatment in the order of treatments matters) and sequence effects (the possibility that a treatment will be affected by the treatment preceding it). For example, let’s say your study for depression had two treatments: counseling and meditation. You would split your treatment group into two, giving one group counseling, then meditation. The second group would receive meditation first, then counseling.


5. Getting social feedback leads to a greater sense of belonging

It turns out the idea of community on social media isn’t just a catchphrase–it’s real.

A study by Dr Stephanie Tobin from The University of Queensland’s School of Psychology found that active participation on social media sites gave users a greater sense of connectedness.

In the study, researchers took a group of Facebook users who post frequently and told half to remain active, while the other half was instructed to simply observe their friends who were still active on the site.

At the end of the study, those who had not posted on Facebook for two days said the experience had a negative affect on their personal well-being.

“Social networking sites such as Facebook, which has more than a billion users a month, give people immediate reminders of their social relationships and allow them to communicate with others whenever they want,” Tobin said.

Another study had participants post to social media but made sure that they received no responses or feedback—those participants, too, felt negative effects on their self-esteem and well-being.

Marketing takeaway: Social media users crave feedback and responses. Consider repurposing some of the time you spend promoting your own content to joining relevant conversations where you can add value, opinions or fun.


Experimental Psychology

Question #3: Develop and state your own research hypothesis and its corresponding two statistical hypotheses [i.e., the alternative hypothesis (H1) and the null hypothesis (H0)]. Describe the relationships between the two statistical hypotheses the relationship between the alternative hypothesis and the research hypothesis and state the two possible results after hypothesis testing. How do Type I and Type II errors relate the alternative and null hypotheses?

Question #4: Ivan adopted a 3 x 4 mixed factorial design to study the effects of A and B on a dependent variable. Factor A (IV #1) is a between-subjects variable. Factor B (IV #2) is a within- subjects (repeated) variable. In order to control for possible order effects, Ivan decided to use complete counterbalancing. Please answer the following questions and justify your answer.

(a) How many groups of participants are required in Ivan’s experiment?
(b) How many conditions need to be counterbalanced?
(c) How many sequences need to be enumerated? Why?
(d) If Ivan wanted to include five participants for each sequence, then, how many participants are

required in his experiment?

Question #5: Educational psychologists were interested in the impact the “Just Say No!” Programand contracts on drunk driving among teens. This program was a pilot program. The investigators identified gender as a participant characteristic highly related to alcohol use among teens that would require a matching strategy and analysis statistically. With the cooperation of school officials, 16- year-old students were matched and randomly assigned with equal numbers of males and females in each group. Group A participated in a “Just Say No!” program, which required a one-hour information session instead of P.E. for six weeks. Students were presented with written factual information, motivational lectures, guidance films, and assertiveness training. Students were also encouraged to sign a personal responsibility contract that stipulated that they would not drink and drive. Group B participated in regular P. E. classes for the six-week experimental period. A two- factor factorial analysis was used to analyze the data. Please answer the following questions and justify your answer.

(a) Identify the experimental design.
(b) What is the independent variable? What is the dependent variable?
(c) Diagram this experimental design.
(d) What are the potential confounds?
(e) How many main effects, interaction effects, simple main effects of A, and simple main effects


METHODOLOGICAL ISSUES

To test the feasibility of online developmental research across a variety of methods and age groups, we conducted three studies: a looking-time study with infants (11–18 months) based on Téglás, Girotto, Gonzalez, and Bonatti (2007), a preferential looking time study with toddlers (24–36 months) based on Yuan and Fisher (2009), and a forced choice study with preschoolers (ages 3 and 4) based on Pasquini, Corriveau, Koenig, and Harris (2007). These allowed us to assess how online testing affected coding and reliability, children’s attentiveness, and parental interference. For details on the specific studies, see Scott et al. (2017).

Coding and Reliability of Looking Time Measures

Each session of the looking time study was coded using VCode (Hagedorn, Hailpern, & Karahalios, 2008) by two coders blind to condition. Looking time for each of eight trials per session was computed based on the time from the first look to the screen until the start of the first continuous one-second lookaway, or until the end of the trial if no valid lookaway occurred. Differences of 1 s or greater, and differences in whether a valid lookaway was detected, were flagged and those trials recoded. Agreement between coders was excellent coders agreed on whether children were looking at the screen on average 94.6% of the time (N = 63 children SD = 5.6%). The mean absolute difference in looking time computed by two coders was 0.77 s (SD = 0.94 s).

Measuring until the first continuous lookaway of a given duration introduces a thresholding effect in addition to the small amount of noise induced by a reduced framerate. The magnitude of this effect depends on the dynamics of children’s looks to and away from the screen. We examined a sample of 1,796 looking times, measured until the first one-second lookaway, from 252 children (M = 13.9 months, SD = 2.6 months) tested in our lab with video recorded at 30 Hz. Reassuringly, in 68% of measurements, the lookaway that ended the measurement was over 1.5 s. We also simulated coding of these videos at framerates ranging from 0.5 to 30 Hz the median absolute difference between looking times calculated from our minimum required framerate of 2 Hz vs. original video was only .16 s (interquartile range = 0.07–0.29 s see Figure S1 (Scott, Chu, and Schulz, 2017)).

Coding and Reliability of Preferential Looking Measures

Each session of the preferential looking study was coded using VCode (Hagedorn et al., 2008) by two coders blind to the placement of test videos. Looks to the left and right are generally clear for examples, see Figure 1. Three calibration trials were included in which an animated attention getter was shown on one side and then the other. During calibration videos, all 138 participants coded looked on average more to the side with the attention getter. For each of nine preferential looking trials, we computed fractional right/left looking times (the fraction of total looking time spent looking to the right/left). Substantial differences (fractional looking time difference greater than .15, when that difference constituted at least 500 ms) were flagged and those clips recoded. A disagreement score was defined as the average of the coders’ absolute disagreement in fractional left looking time and fractional right looking time, as a fraction of trial length. The mean disagreement score across the 138 coded participants was 4.44% (SD = 2.00%, range 1.75–13.44%).

Cropped examples of webcam video frames of children looking to the reader’s left (left column) and right (right column).

Cropped examples of webcam video frames of children looking to the reader’s left (left column) and right (right column).

Coding Child Attentiveness and Parental Interference

Two natural concerns about online testing are that the home environment might be more distracting than the laboratory or that parents might be more likely to interfere with study protocols. In laboratory-based developmental studies, 14% of infants and children on average are excluded due to fussiness, while only 2% of studies give an operational definition of fussiness (Slaughter & Suddendorf, 2007). Looking times from crying children are unlikely to be meaningful, but subjective exclusion criteria reduce the generalizability of results. Similar issues arise with operationalizing parental interference.

To address these issues, we established criteria for fussiness (defined as crying or attempting to leave the parent’s lap), distraction (whether any lookaway was caused by an external event), and various parental actions, including peeking at the video during trials where their eyes should be closed. (See Table S1 and Coding Manual for details.) Exclusion criteria were then based on the number of clips where there was parental interference or where the child was determined to be fussy or distracted. In the two studies using looking measures, two blind coders recorded which actions occurred during individual clips. The first author arbitrated disagreements. This constitutes one of the first direct studies of intercoder agreement on these measures. Coders agreed on fussiness and distraction in at least 85% of clips, with Cohen’s kappa ranging from .37 to .55 (see Table 1).

Intercoder reliability for qualitative coding of clips from the looking time study (Study 1, shaded rows, N = 112) and preferential looking study (Study 2, white rows, N = 138) of Scott et al. (2017).

Note. In Study 1, fussiness and distraction were coded for all 8 clips per participant, and parent interaction measures were coded for the last 6 clips clips were about 20 s long. In Study 2, fussiness and distraction were coded for all 13 clips per participant (ranging from 8 to 50 s), and parent interaction measures for 6 clips (about 8 s each).

Parents were largely compliant with testing protocols, including requests that they refrain from talking or close their eyes during parts of the studies to avoid inadvertently biasing the child. However, compliance was far from perfect: 9% of parents had their eyes mostly open on at least one of 3–4 test trials, and 26% briefly peeked on at least one test trial (see Table S2). In the forced-choice study with preschoolers, parents interfered in 8% of trials by repeating the two options or answering the question themselves before the child’s final answer, generally in cases where the child was reluctant to answer. Practice trials before the test trials mitigated data loss due to parent interference additional recommendations based on our experience are covered in the Discussion and Recommendations section.

Counterbalancing

Condition assignment and counterbalancing were initially achieved simply by assigning each participant to whichever condition had the fewest sessions already in the database. Because many sessions could not be included in analysis, we later manually updated lists of conditions needed to achieve more even counterbalancing and condition assignment. Condition assignment was still not as balanced as in the lab, so we used analysis techniques robust to this variation. Future versions of the platform will allow researchers to continually update the number of included children per condition as coding proceeds.


Simulating the college experience

Though undergraduate and graduate students may be more adept at engaging with online learning tools, college students face many of the same challenges as K–12 students, says Viji Sathy, PhD, a professor of psychology at the University of North Carolina at Chapel Hill. They also face disparities in access to technology and struggle to find social connection amid restrictions on many campuses or in remote learning environments. “It’s a lot harder to create community in this format, when people feel so isolated,” Sathy says.

Yet there have been silver linings in the move to online learning, says Sathy. More professors are now engaging in pedagogical discussions instead of assuming they can simply translate in-person lessons to an online platform, she says. “There’s a new willingness to admit they need guidance and more efforts to access the resources that can help them.”

Many of the efforts to optimize instruction will outlast the pandemic, Sathy adds. “Once they develop those resources, they can take them back to the face-to-face environment.”

This experience may also have raised the profile of online learning, says Francine Conway, PhD, dean of the Graduate School of Applied and Professional Psychology at Rutgers University and past president of the National Council of Schools and Programs of Professional Psychology. “There’s a stigma regarding online learning, especially in doctoral training. It’s often perceived as lower quality, and there’s the perception that advisers can’t adequately train and supervise students using online platforms,” Conway says. “That perception hasn’t kept pace with the reality.”

Virtual education can often be just as effective as in-person learning, she says, thanks to new digital learning platforms, increasing student ease with technology, and a large and growing research literature around online learning. Yet there are challenges, especially in the area of hands-on research and disruptions to internships and practicum training. And some face-to-face interaction is necessary to achieve the competencies required to be a psychologist. Nevertheless, this year of online learning has underscored that there are benefits to going remote, at least in part.

“While there is a need to further support faculty around delivering online content, there are best practices out there. It does a disservice to faculty to assume they won’t adapt to this new environment,” Conway says. “Online learning is here whether we like it or not, and it’s time our profession embraces it.”


METHODOLOGICAL ISSUES

To test the feasibility of online developmental research across a variety of methods and age groups, we conducted three studies: a looking-time study with infants (11–18 months) based on Téglás, Girotto, Gonzalez, and Bonatti (2007), a preferential looking time study with toddlers (24–36 months) based on Yuan and Fisher (2009), and a forced choice study with preschoolers (ages 3 and 4) based on Pasquini, Corriveau, Koenig, and Harris (2007). These allowed us to assess how online testing affected coding and reliability, children’s attentiveness, and parental interference. For details on the specific studies, see Scott et al. (2017).

Coding and Reliability of Looking Time Measures

Each session of the looking time study was coded using VCode (Hagedorn, Hailpern, & Karahalios, 2008) by two coders blind to condition. Looking time for each of eight trials per session was computed based on the time from the first look to the screen until the start of the first continuous one-second lookaway, or until the end of the trial if no valid lookaway occurred. Differences of 1 s or greater, and differences in whether a valid lookaway was detected, were flagged and those trials recoded. Agreement between coders was excellent coders agreed on whether children were looking at the screen on average 94.6% of the time (N = 63 children SD = 5.6%). The mean absolute difference in looking time computed by two coders was 0.77 s (SD = 0.94 s).

Measuring until the first continuous lookaway of a given duration introduces a thresholding effect in addition to the small amount of noise induced by a reduced framerate. The magnitude of this effect depends on the dynamics of children’s looks to and away from the screen. We examined a sample of 1,796 looking times, measured until the first one-second lookaway, from 252 children (M = 13.9 months, SD = 2.6 months) tested in our lab with video recorded at 30 Hz. Reassuringly, in 68% of measurements, the lookaway that ended the measurement was over 1.5 s. We also simulated coding of these videos at framerates ranging from 0.5 to 30 Hz the median absolute difference between looking times calculated from our minimum required framerate of 2 Hz vs. original video was only .16 s (interquartile range = 0.07–0.29 s see Figure S1 (Scott, Chu, and Schulz, 2017)).

Coding and Reliability of Preferential Looking Measures

Each session of the preferential looking study was coded using VCode (Hagedorn et al., 2008) by two coders blind to the placement of test videos. Looks to the left and right are generally clear for examples, see Figure 1. Three calibration trials were included in which an animated attention getter was shown on one side and then the other. During calibration videos, all 138 participants coded looked on average more to the side with the attention getter. For each of nine preferential looking trials, we computed fractional right/left looking times (the fraction of total looking time spent looking to the right/left). Substantial differences (fractional looking time difference greater than .15, when that difference constituted at least 500 ms) were flagged and those clips recoded. A disagreement score was defined as the average of the coders’ absolute disagreement in fractional left looking time and fractional right looking time, as a fraction of trial length. The mean disagreement score across the 138 coded participants was 4.44% (SD = 2.00%, range 1.75–13.44%).

Cropped examples of webcam video frames of children looking to the reader’s left (left column) and right (right column).

Cropped examples of webcam video frames of children looking to the reader’s left (left column) and right (right column).

Coding Child Attentiveness and Parental Interference

Two natural concerns about online testing are that the home environment might be more distracting than the laboratory or that parents might be more likely to interfere with study protocols. In laboratory-based developmental studies, 14% of infants and children on average are excluded due to fussiness, while only 2% of studies give an operational definition of fussiness (Slaughter & Suddendorf, 2007). Looking times from crying children are unlikely to be meaningful, but subjective exclusion criteria reduce the generalizability of results. Similar issues arise with operationalizing parental interference.

To address these issues, we established criteria for fussiness (defined as crying or attempting to leave the parent’s lap), distraction (whether any lookaway was caused by an external event), and various parental actions, including peeking at the video during trials where their eyes should be closed. (See Table S1 and Coding Manual for details.) Exclusion criteria were then based on the number of clips where there was parental interference or where the child was determined to be fussy or distracted. In the two studies using looking measures, two blind coders recorded which actions occurred during individual clips. The first author arbitrated disagreements. This constitutes one of the first direct studies of intercoder agreement on these measures. Coders agreed on fussiness and distraction in at least 85% of clips, with Cohen’s kappa ranging from .37 to .55 (see Table 1).

Intercoder reliability for qualitative coding of clips from the looking time study (Study 1, shaded rows, N = 112) and preferential looking study (Study 2, white rows, N = 138) of Scott et al. (2017).

Note. In Study 1, fussiness and distraction were coded for all 8 clips per participant, and parent interaction measures were coded for the last 6 clips clips were about 20 s long. In Study 2, fussiness and distraction were coded for all 13 clips per participant (ranging from 8 to 50 s), and parent interaction measures for 6 clips (about 8 s each).

Parents were largely compliant with testing protocols, including requests that they refrain from talking or close their eyes during parts of the studies to avoid inadvertently biasing the child. However, compliance was far from perfect: 9% of parents had their eyes mostly open on at least one of 3–4 test trials, and 26% briefly peeked on at least one test trial (see Table S2). In the forced-choice study with preschoolers, parents interfered in 8% of trials by repeating the two options or answering the question themselves before the child’s final answer, generally in cases where the child was reluctant to answer. Practice trials before the test trials mitigated data loss due to parent interference additional recommendations based on our experience are covered in the Discussion and Recommendations section.

Counterbalancing

Condition assignment and counterbalancing were initially achieved simply by assigning each participant to whichever condition had the fewest sessions already in the database. Because many sessions could not be included in analysis, we later manually updated lists of conditions needed to achieve more even counterbalancing and condition assignment. Condition assignment was still not as balanced as in the lab, so we used analysis techniques robust to this variation. Future versions of the platform will allow researchers to continually update the number of included children per condition as coding proceeds.


5. Getting social feedback leads to a greater sense of belonging

It turns out the idea of community on social media isn’t just a catchphrase–it’s real.

A study by Dr Stephanie Tobin from The University of Queensland’s School of Psychology found that active participation on social media sites gave users a greater sense of connectedness.

In the study, researchers took a group of Facebook users who post frequently and told half to remain active, while the other half was instructed to simply observe their friends who were still active on the site.

At the end of the study, those who had not posted on Facebook for two days said the experience had a negative affect on their personal well-being.

“Social networking sites such as Facebook, which has more than a billion users a month, give people immediate reminders of their social relationships and allow them to communicate with others whenever they want,” Tobin said.

Another study had participants post to social media but made sure that they received no responses or feedback—those participants, too, felt negative effects on their self-esteem and well-being.

Marketing takeaway: Social media users crave feedback and responses. Consider repurposing some of the time you spend promoting your own content to joining relevant conversations where you can add value, opinions or fun.


Simulating the college experience

Though undergraduate and graduate students may be more adept at engaging with online learning tools, college students face many of the same challenges as K–12 students, says Viji Sathy, PhD, a professor of psychology at the University of North Carolina at Chapel Hill. They also face disparities in access to technology and struggle to find social connection amid restrictions on many campuses or in remote learning environments. “It’s a lot harder to create community in this format, when people feel so isolated,” Sathy says.

Yet there have been silver linings in the move to online learning, says Sathy. More professors are now engaging in pedagogical discussions instead of assuming they can simply translate in-person lessons to an online platform, she says. “There’s a new willingness to admit they need guidance and more efforts to access the resources that can help them.”

Many of the efforts to optimize instruction will outlast the pandemic, Sathy adds. “Once they develop those resources, they can take them back to the face-to-face environment.”

This experience may also have raised the profile of online learning, says Francine Conway, PhD, dean of the Graduate School of Applied and Professional Psychology at Rutgers University and past president of the National Council of Schools and Programs of Professional Psychology. “There’s a stigma regarding online learning, especially in doctoral training. It’s often perceived as lower quality, and there’s the perception that advisers can’t adequately train and supervise students using online platforms,” Conway says. “That perception hasn’t kept pace with the reality.”

Virtual education can often be just as effective as in-person learning, she says, thanks to new digital learning platforms, increasing student ease with technology, and a large and growing research literature around online learning. Yet there are challenges, especially in the area of hands-on research and disruptions to internships and practicum training. And some face-to-face interaction is necessary to achieve the competencies required to be a psychologist. Nevertheless, this year of online learning has underscored that there are benefits to going remote, at least in part.

“While there is a need to further support faculty around delivering online content, there are best practices out there. It does a disservice to faculty to assume they won’t adapt to this new environment,” Conway says. “Online learning is here whether we like it or not, and it’s time our profession embraces it.”


Many libraries and museums have taken their special collections such as rare books, manuscripts, photographs, pamphlets, news clippings, musical scores and more and have digitized them to create collections of digital assets that can be displayed online through a digital exhibition. Digital exhibits such as these offer unprecedented access to organizational treasures that might never be seen otherwise except by those with local physical access to the museum or library. A new breed of open-source and free software tools has recently emerged making it possible to catalog and manage digital collections and create robust narratives and layouts for display online.

These are the main software applications which are used by libraries and museums to create digital exhibits and for digital asset management. The industry leader in this space is a proprietary application called Contentdm (http://www.contentdm.org/) created by OCLC. Contenddm is a digital collection management software that allows for the upload, description, management and access of digital collections. This application offers robust cataloging features and an easy-to-use interface but is cost-prohibitive for many non-profit organizations. Entry level CONTENTdm options start at $4,300 annually with mid-size Licenses that start at a $10,000 one-time fee with ongoing annual maintenance starting at $2,000.

A Contentdm Digital Collection

Free and Open Source Tools

However, there are many free and open source alternatives to Contentdm for creating online interactive digital exhibits.

Omeka
http://omeka.org/
Omeka is a free, open source web publishing system for online digital archives. Its main focus/strength is producing websites and online exhibitions. Both the Web interface and back end cataloging system are one unified application. Users can build attractive websites and exhibits using templates and page layouts, without having to adjust code, although more robust displays can be created by customizing the CSS and HTML files, and moving around some PHP snippets. Omeka has a plugin available for OAI support to make collections harvestable by major search engines. Although Omeka is a bit more limited than some other applications such as Collective Access (see below) in terms of cataloging & metadata capabilities, it allows fast/easy creation of online exhibits through a Web interface, a low learning curve, many plugins with added functionality, and a large developer community.

Metadata Supported: Omeka uses Dublin Core and MODS metadata, and offers customizable item type cataloging. There are many templates and plugins which offer added functionality such as displaying items on Google Maps, providing LCSH for cataloging

Hosted Version and/or Downloadable Code Available? Omeka offers both a hosted, Web-based version or the downloadable application which can be installed and hosted on-site by the organization.

Recommended for: Libraries, Museums

Collective Access
http://collectiveaccess.org/
Collective Access is a free, open source cataloging tool and web-based application for museums, archives and digital collections. Its main focus/strength is on cataloging and metadata. You can create very robust cataloging records, create relationships between items, create profiles of creators and subjects of items and link them to objects, etc. Collective Access offers multiple metadata schemas. The Web component, called Pawtucket, is a separate installation, and necessitates editing php files in order to build/adjust websites. A front-end PHP programmer would be necessary with this solution, and quite possibly one to set up the back-end templates as well.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: DublinCore, VRA, CDWA/CCO, MARC (planned), others, plus the ability to create in house standards and to customize existing standards. Ability to access external data sources and services such as LCSH, Getty Art & Architecture Thesaurus, and GoogleMaps, GoogleEarth or GeoNames for geospatial cataloguing.

Recommended for: Libraries, Museums

CollectionSpace
http://www.collectionspace.org/
ColectionSpace is a free, open-source collections management application for museums, libraries, historical societies, and other organizations with special collections. The application is administered by Museum of the Moving Image, but it's a joint partners with the division of Information Services and Technology at the University of California, Berkeley and the Centre for Applied Research in Educational Technologies at the University of Cambridge. The software is made up of a suite of modules and services for managing your collections of digital assets, however it doesn't have any native ability to create digital exhibits. Instead, it enables users to connect with other open-source applications already in use by the cultural sector for online exhibition creation. The application allows for the creation of a customized controlled vocabulary for describing collections.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: CollectionSpace supports multiple metadata schemas including DublinCore and customized schemas.

Recommended for: Libraries, Museums

Open Exhibits
http://openexhibits.org/
Open Exhibits is a multitouch, multi-user tool kit that allows you to create custom interactive exhibits. The strength of this application has less to do with cataloging collections of digital assets, but developing online and interactive exhibits with digital objects. The multi-touch piece comes into play with the ability to specify that certain types of user behaviors will result in various outcomes, e.g. if a user drags a certain section of an image, the entire image will move and readjust along with the movement. Users without technical expertise can work with pre-existing templates and modules, while developers can create their own with the SDK kit. The application uses a combination of its own markup languages &ndash Creative Mark-up Language (CML) and Gesture Mark-up Language (GML) along with CSS libraries.

Hosted Version and/or Downloadable Code Available? The application is downloadable and must be hosted by the organization, no hosted version is available.

Metadata Supported: Not applicable.

Recommended for: Museums

Pachyderm
http://pachyderm.nmc.org
Pachyderm is a free, open-source and easy-to-use multimedia authoring tool created by the New Media Consortium (NMC). It's been designed for people with little technology or multimedia experience and involves little more than filling out a web form. Authors place their digital assets (images, audio clips, and short video segments) into pre-designed templates, which can play video and audio, link to other templates, zoom in on images, and more. Completed templates result in interactive, Flash-based presentations that can include images, sounds, video, and text that can be downloaded and displayed on websites or can be kept on the Pachyderm server and linked directly from there.

Hosted Version and/or Downloadable Code Available? The NMC has stated that they are no longer offering hosted accounts at this time so the application must be downloaded and hosted by the organization or individual.


Across Subjects Counterbalancing

When one set of treatments is given, but many sequences are used overall, it’s called across subjects counterbalancing. Your participants receive all combinations of treatments in different orders. This is to guard against order effects (the possibility that the position of the treatment in the order of treatments matters) and sequence effects (the possibility that a treatment will be affected by the treatment preceding it). For example, let’s say your study for depression had two treatments: counseling and meditation. You would split your treatment group into two, giving one group counseling, then meditation. The second group would receive meditation first, then counseling.


4 thoughts on &ldquo Conducting Online Experiments – The How To Guide &rdquo

I’m currently looking at running memory experiments online, and javascript + qualtrics looks like a go-er. Just wondered how you integrated your javascript into Qualtrics—in particular, how have you been saving your data in a way that you can access it later. The way I have things set up now means that the raw responses from all trials for a participant end up in a single csv field when I download from qualtrics!

I have only ever run experiments using javascript or qualtrics and never together although that does seem like a great option! Unfortunately that means that I am not sure what might be the best way to integrate the two. Good luck!

Thanks a lot for the advcies, I’ll be definitely using them since I’m looking at running online experiments as well.

Existing platforms like Qualtrics are not flexible enough for the experiments I have in mind, so I started educating myself on HTTP, CSS and JavaScript. However, these are general purpose resources, not really tailored for running experiments. It would be great if you could point out some resources that would speed up learning – like GitHub accounts with some examples of experiments together with the code etc.

https://github.com/EoinTravers/PsychScript is a very useful resource with scripts and libraries for running online psychology experiments.

Otherwise, if you have firefox and install firebug you can view the code for any experiment online. Looking at how others programmed their experiments can be a great way to get a feel for how to do it yourself although it can be tricky trying to read others’ script. Firebug is also great for troubleshooting when you write your own script.


PsiTurk: An open-source framework for conducting replicable behavioral experiments online

Online data collection has begun to revolutionize the behavioral sciences. However, conducting carefully controlled behavioral experiments online introduces a number of new of technical and scientific challenges. The project described in this paper, psiTurk, is an open-source platform which helps researchers develop experiment designs which can be conducted over the Internet. The tool primarily interfaces with Amazon’s Mechanical Turk, a popular crowd-sourcing labor market. This paper describes the basic architecture of the system and introduces new users to the overall goals. psiTurk aims to reduce the technical hurdles for researchers developing online experiments while improving the transparency and collaborative nature of the behavioral sciences.


Introduction

Behavioral research and experimental psychology are increasing their use of web browsers and the internet to reach larger (Adjerid & Kelley, 2018) and more diverse (Casler, Bickel, & Hackett, 2013) populations than has previously been feasible with lab-based methods. However, unique variables are introduced when working within an online environment. The experience of the user is the result of a large number of connected technologies, including the server (which hosts the experiment), the internet service provider (which delivers the data), the browser (which presents the experiment to the participant and measures their responses), and the content itself—which is determined by a mixture of media (e.g., audio/pictures/video) and code in different programming languages (e.g., JavaScript, HTML, CSS, PHP, Java). Linking these technologies is technically difficult, time-consuming, and costly. Consequently, until recently, online research was generally carried out—and scrutinized—by those with the resources to overcome these barriers.

The purpose of this article is threefold: first, to explore the problems inherent to running behavioral experiments online with web programming languages, the issues this can create for timing accuracy, and recent improvements that can mitigate these issues second, to introduce Gorilla, an online experiment builder that uses best practices to overcome these timing issues and makes reliable online experimentation accessible and transparent to the majority of researchers third, to demonstrate the timing accuracy and reliability provided by Gorilla. We achieved this last goal using data from a flanker task—which requires high timing fidelity—collected from a wide range of participants, settings, equipment, and internet connection types.

JavaScript

The primary consideration for online experimenters in the present time is JavaScript, the language that is most commonly used to generate dynamic content on the web (such as an experiment). Its quirks (which are discussed later) can lead to problems with presentation time, and understanding it forms a large part of an access barrier.

JavaScript is at the more dynamic end of the programming language spectrum. It is weakly typed and allows core functionality to be easily modified. Weak typing means that variables do not have declared types the user simply declares a variable and then uses it in their code. This is in contrast to strongly typed languages, in which the user must specify whether a variable they declare should be an integer, a string, or some other structure. This can lead to unnoticed idiosyncrasies—if a user writes code that attempts to divide a string by a number, or assign a number to a variable that was previously assigned to an array, JavaScript allows this to proceed. Similarly, JavaScript allows users to call functions without providing all the arguments to that function. This dynamic nature gives more flexibility, but at the cost of allowing mistakes or unintended consequences to creep in. By contrast, in a strongly typed language, incorrect assignments or missing function arguments would be marked as errors that the user should correct. This results in a more brittle, but safer, editing environment. JavaScript also allows a rare degree of modification of core structures—even the most fundamental building blocks (such as arrays) can have extra methods added to them. This can prove useful in some cases, but can easily create confusion as to which parts of the code are built-in and which parts are user defined. Together, these various factors create a programming environment that is very flexible, but one in which mistakes are easy to make and their consequences can go undetected by the designer (Richards, Lebresne, Burg, & Vitek, 2010). This is clearly not ideal for new users attempting to create controlled scientific experiments. Below we discuss two significant hurdles when building web experiments: inaccuracies in the timing of various experiment components in the browser, and the technical complexities involved in implementing an online study, including JavaScript’s contributions. These complexities present an access barrier to controlled online experiments for the average behavioral researcher.

History of timing concerns

Timing concerns have been expressed regarding online studies (for an overview, see Woods, Velasco, Levitan, Wan, & Spence, 2015), and although many of these concerns are now historic for informed users—because solutions exist—they are still an issue for new users who may not be aware of them. These concerns can be divided into the timing of stimuli—that is, an image or sound is not presented for the duration you want—and the timing of response recording—that is, the participant did not press a button at the time they are recorded doing so. These inaccuracies have obvious implications for behavioral research, especially those using time-based measures such as reaction time (RT).

Several things might be driving these timing issues: First, in JavaScript programs, most processes within a single web-app or browser window pass through an event loop Footnote 1 —a single thread that decides what parts of the JavaScript code to run, and when. This loop comprises different types of queues. Queues that are managed synchronously wait until one task is complete before moving on. One example of a synchronously managed queue is the event queue, which stores an ordered list of things waiting to be run. Queues that are managed asynchronously will start new tasks instead of waiting for the preceding tasks to finish, such as the queue that manages loading resources (e.g., images). Most presentation changes are processed through the event loop in an asynchronous queue. This could be an animation frame updating, an image being rendered, or an object being dragged around. Variance in the order in which computations are in the queue, due to any experiment’s code competing with other code, can lead to inconsistent timing. When a synchronous call to the event loop requires a lot of time, it can “block” the loop—preventing everything else in the queue from passing through. For instance, you may try and present auditory and visual stimuli at the same time, but they could end up out of synchronization if blocking occurs—a common manifestation of this in web videos is unsynchronized audio and video.

Second, the computational load on the current browser window will slow the event loop down variance in timing is, therefore, dependent on different computers, browsers, and computational loads (Jia, Guo, Wang, & Zhang, 2018). For a best-practices overview, see Garaizar and Reips (2018). Given the need for online research to make use of onsite computers such as those in homes or schools, the potential variance mentioned above is an important issue. A laptop with a single processor, a small amount of memory, and an out-of-date web browser is likely to struggle to present stimuli to the same accuracy as a multicore desktop with the most recent version of Google Chrome installed. These variances can represent variance of over 100 ms in presentation timing (Reimers & Stewart, 2016).

Third, by default, web browsers load external resources (such as images or videos) progressively as soon as the HTML elements that use them are added to the page. This results in the familiar effect of images “popping in” as the page loads incrementally. If each trial in an online task is treated as a normal web page, this “popping in” will lead to inaccurate timing. Clearly, such a variance in display times would be unsuitable for online research, but the effect can be mitigated by loading resources in advance. A direct solution is to simply load all the required resources, for all the trials, in advance of starting the task (Garaizar & Reips, 2018). This can be adequate for shorter tasks or tasks that use a small number of stimuli, but as the loading time increases, participants can become more likely to drop out, resulting in an increase in attrition.

The same concerns (with the exception of connection speed) can be applied to the recording of RTs, which are dependent on a JavaScript system called the “event system.” When a participant presses a mouse or keyboard button, recording of these responses (often through a piece of code called an “Event Listener”) gets added to the event loop. To give a concrete example, two computers could record different times of an identical mouse response based on their individual processing loads. It must be noted that this issue is independent of the browser receiving an event (such as a mouse click being polled by the operating system), for which there is a relatively fixed delay, which has been shown to be equivalent in nonbrowser software (de Leeuw & Motz, 2016)—this receiving delay is discussed later in the article. Timing of event recording using the browser system clock (which some JavaScript functions do) is also another source of variance—because different machines and operating systems will have different clock accuracies and update rates.

Current state of the art

Presently, the improved processing capabilities in common browsers and computers, in concert with improvements in web-language standards—such as HTML5 and ECMAScript 6—offer the potential to overcome some concerns about presentation and response timings (Garaizar, Vadillo, & López-de Ipiña, 2012, 2014 Reimers & Stewart, 2015, 2016 Schmidt, 2001). This is because, in addition to standardized libraries (which improve the consistency of any potential web experiment between devices), these technologies use much more efficient interpreters, which are the elements of the browser that execute the code and implements computations. An example of this is Google’s V8, which improves processing speed—and therefore the speed of the event loop—significantly (Severance, 2012). In fact, several researchers have provided evidence that response times are comparable between browser-based applications and local applications (Barnhoorn, Haasnoot, Bocanegra, & van Steenbergen, 2015), even in poorly standardized domestic environments—that is, at home (Miller, Schmidt, Kirschbaum, & Enge, 2018).

A secondary benefit of recent browser improvements is scalability. If behavioral research continues to take advantage of the capacity for big data provided by the internet, it needs to produce scalable methods of data collection. Browsers are becoming more and more consistent in the technology they adopt—meaning that code will be interpreted more consistently across your experimental participants. At the time of writing, the standard for browser-based web apps is HTML5 (the World Wide Web Consortium, 2019, provides the current web standards) and the ECMAScript JavaScript (Zaytsev, 2019, shows that most browsers currently support ECMAScript 5 and above). ECMAScript (ES) is a set of standards that are implemented in JavaScript (but, can also be implemented in other environments—e.g., ActionScript in Flash), and browsers currently support a number of versions of this standard (see Zaytsev, 2019, for details). The combination of ES and HTML5, in addition to having improved timing, is also the most scalable. They reach the greatest number of users—with most browsers supporting them, which is in contrast with other technologies, such as Java plugins and Flash that are becoming inconsistently supported—in fact, Flash support has recently begun a departure from all major browsers.

Access barriers

Often, to gain accurate timing and presentation, you must have a good understanding of key browser technologies. As in any application in computer science, there are multiple methods for achieving the same goal, and these may vary in the quality and reliability of the data they produce. One of the key resources for tutorials on web-based apps—the web itself—may lead users to use out-of-date or unsupported methods with the fast-changing and exponentially expanding browser ecosystem, this is a problem for the average behavioral researcher (Ferdman, Minkov, Bekkerman, & Gefen, 2017). This level of complexity imposes an access barrier to creating a reliable web experiment—the researcher must have an understanding of the web ecosystem they operate in and know how to navigate its problems with appropriate tools.

However, tools are available that lower these barriers in various ways. Libraries, such as jsPsych (de Leeuw, 2015), give a toolbox of JavaScript commands that are implemented at a higher level of abstraction—therefore relieving the user of some implementation-level JavaScript knowledge. Hosting tools such as “Just Another Tool for Online Studies” (JATOS) allow users to host JavaScript and HTML studies (Lange, Kühn, & Filevich, 2015) and present the studies to their participants—this enables a research-specific server to be set up. However, with JATOS you still need to know how to set it up and manage your server, which requires a considerable level of technical knowledge. The user will also need to consider putting safeguards in place to manage unexpected server downtime caused by a whole range of issues. This may require setting up a back-up system or back-up server. A common issue is too many participants accessing the server at the same time, which can cause it to overload and likely prevent access to current users midexperiment—which can lead to data loss (Schmidt, 2000).

The solutions above function as “packaged software,” in which the user is responsible for all levels of implementation (i.e., browser, networking, hosting, data processing, legal compliance, regulatory compliance and insurance)—in the behavioral research use-case, this requires multiple tools to be stitched together (e.g., jsPsych in the browser and JATOS for hosting). This itself presents another access barrier, as the user then must understand—to some extent—details of the web server (e.g., how many concurrent connections their hosted experiment will be able to take), hosting (the download/upload speeds), the database (where and how data will be stored e.g., in JavaScript object notation format, or in a relational database), and how the participants are accessing their experiment and how they are connected (e.g., through Prolific.ac or Mechanical Turk).

One way to lower these barriers is to provide a platform to manage all of this for the user, commonly known as software as a service (SaaS Turner, Budgen, & Brereton, 2003). All of the above can be set up, monitored, and updated for the experimenter, while also providing as consistent and reproducible an environment as possible—something that is often a concern for web research. One recent example is the online implementation of PsyToolkit (Stoet, 2017), through which users can create, host, and run experiments on a managed web server and interface however, there is still a requirement to write out the experiment in code, which represents another access limitation.

Some other tools exist in the space between SaaS and packaged software. PsychoPy3 (Peirce & MacAskill, 2018) is an open-source local application offering a graphical task builder and a Python programming library. It offers the ability to export experiments built in the task builder (but currently not those built using their Python library) to JavaScript, and then to a closed-source web platform based on GitLab (an repository -based b version control system) called Pavlovia.org, where users can host that particular task for data collection. Lab.js (Henninger, Mertens, Shevchenko, & Hilbig, 2017) is another task builder, which provides a web-based GUI, in which users can build a task and download a package containing the HTML, CSS, and JavaScript needed to run a study. Users are then able to export this for hosting on their own or on third-party servers. Neither of these tools functions fully as SaaS, since they do not offer a fully integrated platform that allows you to build, host, distribute tasks for, and manage complex experimental designs (e.g., a multiday training study) without programming, in the same environment. A full comparison of packaged software, libraries, and hosting solutions can be found in Table 1.

The Gorilla Experiment Builder

Gorilla (www.gorilla.sc) is an online experiment builder whose aim is to lower the barrier to access, enabling all researchers and students to run online experiments (regardless of programming and networking knowledge). As well as giving greater access to web-based experiments, it reduces the risk of introducing higher noise in data (e.g., due to misuse of browser-based technology). By lowering the barrier, Gorilla aims to make online experiments available and transparent at all levels of ability. Currently, experiments have been conducted in Gorilla on a wide variety of topics, including cross-lingual priming (Poort & Rodd, 2017), the provision of lifestyle advice for cancer prevention (Usher-Smith et al., 2018), semantic variables and list memory (Pollock, 2018), narrative engagement (Richardson et al., 2018), trust and reputation in the sharing economy (Zloteanu, Harvey, Tuckett, & Livan, 2018), how individuals’ voice identities are formed (Lavan, Knight, & McGettigan, 2018), and auditory perception with degenerated music and speech (Jasmin, Dick, Holt, & Tierney, 2018). Also, several studies have preregistered reports, including explorations of object size and mental simulation of orientation (Chen, de Koning, & Zwaan, 2018) and the use of face regression models to study social perception (Jones, 2018). Additionally, Gorilla has also been mentioned in an article on the gamification of cognitive tests (Lumsden, Skinner, Coyle, Lawrence, & Munafò, 2017). Gorilla was launched in September 2016, and as of January 2019 over 5,000 users have signed up to Gorilla, across more than 400 academic institutions. In the last three months of 2018, data were collected from over 28,000 participants—an average of around 300 participants per day.

One of the greatest differences between Gorilla and the other tools mentioned above (a comprehensive comparison of these can be found in Table 1) is that it is an experiment design tool, not just a task-building or questionnaire tool. At the core of this is the Experiment Builder, a graphical tool that allows you to creatively reconfigure task and questionnaires into a wide number of different experiment designs without having to code. The interface is built around dragging and dropping nodes (which represent what the participant sees at that point, or modifications to their path through the experiment) and connecting them together with arrow lines. This modular approach makes it much easier for labs to reuse elements that have been created before, by themselves or by others. For instance, this allows any user to construct complex, counterbalanced, randomized, between-subjects designs with multiday delays and email reminders, with absolutely no programming needed. Examples of this can be seen in Table 2.

Gorilla provides researchers with a managed environment in which to design, host, and run experiments. It is fully compliant with the EU General Data Protection Regulation and with NIHR and BPS guidelines, and it has backup communication methods for data in the event of server problems (to avoid data loss). A graphical user interface (GUI) is available for building questionnaires (called the “Questionnaire Builder”), experimental tasks (the “Task Builder”), and running the logic of experiments (“Experiment Builder”). For instance, a series of different attention and memory tasks could be constructed with the Task Builder, and their order of presentation would be controlled with the Experiment Builder. Both are fully implemented within a web browser and are illustrated in Fig. 1. This allows users with little or no programming experience to run online experiments, whilst controlling and monitoring presentation and response timing.

Example of the two main GUI elements of Gorilla. (A) The Task Builder, with a screen selected showing how a trial is laid out. (B) The Experiment Builder, showing a check for the participant, followed by a randomizer node that allocates the participant to one of two conditions, before sending them to a Finish node

At the Experiment Builder level (Fig. 1B), users can create logic for the experiment through its nodes, which manage capabilities such as randomization, counterbalancing, branching, task switching, repeating, and delay functions. This range of functions makes it as easy to create longitudinal studies with complex behavior. An example could be a four-week training study with email reminders, in which participants would receive different tasks based on prior performance, or the experiment tree could just as easily enable a one-shot, between-subjects experiment. Additionally, Gorilla includes a redirect node that allows users to redirect participants to another hosted service and then send them back again. This allows users to use the powerful Experiment Builder functionality (i.e., multiday testing) while using a different service (such as Qualtrics) at the task or questionnaire level. Table 2 provides a more detailed explanation of several example experiments made in the builder.

The Task Builder (Fig. 1A) provides functionality at the task level. Each experimental task is separated into “displays” that are made of sequences of “screens.” Each screen can be configured by the user to contain an element of a trial, be that text, images, videos, audio, buttons, sliders, keyboard responses, progress bars, feedback, or a wide range of other stimuli and response options. See the full list here: https://gorilla.sc/support/articles/features. The content of these areas either can be static (such as instructions text) or can change on a per-trial basis (when the content is set using a spreadsheet). The presentation order of these screens is dependent on sequences defined in this same spreadsheet, in which blocked or complete randomization can take place on the trial level. Additionally, the Task Builder also has a “Script” tab, which allows the user to augment the functionality provided by Gorilla with JavaScript. This allows users to use the GUI and JavaScript side by side. There is also a separate “Code Editor,” which provides a developmental environment to make experiments purely in code. This allows users to include external libraries, such as jsPsych. The purpose of the Code Editor is to provide a secure and reliable service for hosting, data storage, and participant management for tasks written in code.

Using tools like the Code Editor, users can extend the functionality of Gorilla through use of the scripting tools, in which custom JavaScript commands, HTML templates, and an application programming interface (API) are available—an API is a set of functions that gives access to the platform’s functionality in the Code Editor, and also allows users to integrate third-party libraries into their experiments (e.g., tasks programmed in jsPsych). Therefore, Gorilla also can function as a learning platform through which users progress on to programming—while providing an API that manages more complex issues (such as timing and data management) that might cause a beginner to make errors. The Code Editor allows the inclusion of any external libraries (e.g., pixi.js for animation, OpenCV.js for image processing, or WebGazer.js for eyetracking). A full list of features is available at www.gorilla.sc/tools, and a tutorial is included in the supplementary materials.

Timing control

A few techniques are utilized within Gorilla to control timing. To minimize any potential delays due to network speed (mentioned above), the resources from several trials are loaded in advance of presentation, a process called caching. Gorilla loads the assets required for the next few trials, begins the task, and then continues to load assets required for future trials while the participant completes the task. This strikes an optimal balance between ensuring that trials are ready to be displayed when they are reached, while preventing a lengthy load at the beginning of the task. This means that fluctuations in connection speed will not lead to erroneous presentation times. The presentation of stimuli are achieved using the requestAnimationFrame() function, which allows the software to count frames and run code when the screen is about to be refreshed, ensuring that screen-refreshing in the animation loop does not cause hugely inconsistent presentation. This method has previously been implemented to achieve accurate audio presentation (Reimers & Stewart, 2016) and accurate visual presentation (Yung, Cardoso-Leite, Dale, Bavelier, & Green, 2015). Rather than assuming that each frame is going to be presented for 16.667 ms, and presenting a stimulus for the nearest number of frames (something that commonly happens), Gorilla times each frame’s actual duration—using requestAnimationFrame(). The number of frames a stimulus is presented for can, therefore, be adjusted depending on the duration of each frame—so that most of the time a longer frame refresh (due to lag) will not lead to a longer stimulus duration. This method was used in the (now defunct) QRTEngine (Barnhoorn et al., 2015), and to our knowledge is not used in other experiment builders (for a detailed discussion of this particular issue, see the following GitHub issue, www.github.com/jspsych/jsPsych/issues/75, and the following blog post on the QRTEngine’s website, www.qrtengine.com/comparing-qrtengine-and-jspsych/).

RT is measured and presentation time is recorded using the performance.now() function, which is independent of the browser’s system clock, and therefore not impacted by changes to this over time. This is the same method used by QRTEngine, validated using a photodiode (Barnhoorn et al., 2015). Although performance.now() and its associated high-resolution timestamps offer the greatest accuracy, resolution has been reduced intentionally by all major browsers, in order to mitigate certain security threats (Kocher et al., 2018 Schwarz, Maurice, Gruss, & Mangard, 2017). In most browsers, the adjusted resolution is rounded to the nearest 1–5 ms, with 1 ms being the most common value (Mozilla, 2019). This is unlikely to be a permanent change, and will be improved when the vulnerabilities are better understood (Mozilla, 2019 Ritter & Mozilla, 2018).

Additionally, to maximize data quality, the user can restrict through the GUI which devices, browsers, and connection speeds participants will be allowed to have, and all these data are then recorded. This method allows for restriction of the participant’s environment, where only modern browser/device combinations are permitted, so that the above techniques—and timing accuracy—are enforced. The user is able to make their own call, in a trade-off between potential populations of participants and restrictions on them to promote accurate timing, dependent on the particulars of the task or study.

Case study

As a case study, a flanker experiment was chosen to illustrate the platform’s capability for accurate presentation and response timing. To demonstrate Gorilla’s ability to work within varied setups, different participant groups (primary school children and adults in both the UK and France), settings (without supervision, at home, and under supervision, in schools and in public engagement events), equipment (own computers, computer supplied by researcher), and connection types (personal internet connection, mobile phone 3G/4G) were selected.

We ran a simplified flanker task taken from the attentional network task (ANT Fan, McCandliss, Sommer, Raz, & Posner, 2002 Rueda, Posner, & Rothbart, 2004). This task measures attentional skills, following attentional network theory. In the original ANT studies, three attentional networks were characterized: alerting (a global increase in attention, delimited in time but not in space), orienting (the capacity to spatially shift attention to an external cue), and executive control (the resolution of conflicts between different stimuli). For the purpose of this article, and for the sake of simplicity, we will focus on the executive control component. This contrast was chosen because MacLeod et al. (2010) found that it was highly powered and reliable, relative to the other conditions in the ANT. Participants responded as quickly as possible to a central stimulus that was pointing either in the same direction as identical flanking stimuli or in the opposite direction. Thus, there were both congruent (same direction) and incongruent (opposite direction) trials.

Research with this paradigm has robustly shows that RTs to congruent trials are faster than those to incongruent trials—Rueda et al. (2004) have termed this the “conflict network.” This RT difference, although significant, is often less that 100 ms, and thus very accurately timed visual presentation and accurate recording of responses are necessary. Crump, McDonnell, and Gureckis (2013) successfully replicated the results of a similar flanker task online, using Amazon Mechanical Turk, with letters as the targets and flankers, so we know this can be an RT-sensitive task that works online. Crump et al. coded this task in JavaScript and HTML and managed the hosting and data storage themselves however, the present versions of the experiment were created and run entirely using Gorilla’s GUI. We hypothesized that the previously recorded conflict RT difference would be replicated on this platform.


Experimental Psychology

Question #3: Develop and state your own research hypothesis and its corresponding two statistical hypotheses [i.e., the alternative hypothesis (H1) and the null hypothesis (H0)]. Describe the relationships between the two statistical hypotheses the relationship between the alternative hypothesis and the research hypothesis and state the two possible results after hypothesis testing. How do Type I and Type II errors relate the alternative and null hypotheses?

Question #4: Ivan adopted a 3 x 4 mixed factorial design to study the effects of A and B on a dependent variable. Factor A (IV #1) is a between-subjects variable. Factor B (IV #2) is a within- subjects (repeated) variable. In order to control for possible order effects, Ivan decided to use complete counterbalancing. Please answer the following questions and justify your answer.

(a) How many groups of participants are required in Ivan’s experiment?
(b) How many conditions need to be counterbalanced?
(c) How many sequences need to be enumerated? Why?
(d) If Ivan wanted to include five participants for each sequence, then, how many participants are

required in his experiment?

Question #5: Educational psychologists were interested in the impact the “Just Say No!” Programand contracts on drunk driving among teens. This program was a pilot program. The investigators identified gender as a participant characteristic highly related to alcohol use among teens that would require a matching strategy and analysis statistically. With the cooperation of school officials, 16- year-old students were matched and randomly assigned with equal numbers of males and females in each group. Group A participated in a “Just Say No!” program, which required a one-hour information session instead of P.E. for six weeks. Students were presented with written factual information, motivational lectures, guidance films, and assertiveness training. Students were also encouraged to sign a personal responsibility contract that stipulated that they would not drink and drive. Group B participated in regular P. E. classes for the six-week experimental period. A two- factor factorial analysis was used to analyze the data. Please answer the following questions and justify your answer.

(a) Identify the experimental design.
(b) What is the independent variable? What is the dependent variable?
(c) Diagram this experimental design.
(d) What are the potential confounds?
(e) How many main effects, interaction effects, simple main effects of A, and simple main effects


Watch the video: Αυτοέκδοση - Η Συγγραφική πλατφόρμα του διαδικτυακού αυτοεκδότη (January 2022).