BarCamp // 26 & 27.06.14 @ ENSCI

Welcome to the program page for the seminar’s BarCamp. You will find all the details on the BarCamp, the cases and datasets we plan to treat, and ways to participate in the two-day event.

If you want to sign up right away, please click here; otherwise, read on.

I. BarCamp

A BarCamp is a participatory workshop geared toward either the development of web applications or the exploration of datasets using existing software tools and ad hoc code. We are focusing our BarCamp on the exploration of a set of pre-selected and pre-treated datasets. Our goals are two-fold: 1) to give participants of the seminar a hands-on experience crafting stories and analysis with digital data while working in collaborative teams of developers, designers and social science researchers; 2) to advance the visual treatment, or interface design, of datasets connected to existing research projects or civil society groups.

II. Subjects and Datasets

Our seminar began with a number of sessions that looked at how digital methods evolved through different research communities (scientometrics, history, literary studies, sociology and media studies) and how each community has adapted different tools to treat different types of data. We would like to follow a similar approach in the BarCamp by treating a range of “data types” which each have slightly different “digital” qualities. As a result, we have identified 3 cases that will help us in this exploration of “data types”:

Observing community formation and online collaboration on Github
Visualizing the financial impacts of climate risk in SEC 10-K filings
Pesticides online: Debates, polemics and the greening of agriculture

Between the cases, we will have the opportunity to treat various kinds of datasets, including: dynamic, user generated data coming from a social networking-platform (Github); highly formatted textual data coming from the world of financial accounting (SEC 10-k filings); heterogenous textual data coming from a wide variety of media sources (collated through Factiva; press of pesticides), classic scientometric data related to scientific publications (from WoS; on impacts of pesticides), and web cartography data (discussion of pesticides).

The hope is that by proposing various datasets, BarCamp participants will either find a subject or “data type” that corresponds to research interests of their own. Thus, they will be able to draw analogies between the BarCamp cases and their own research. As for outcomes, ideally, after two days there will be results to share with the research and/or civil society groups connected to the datasets; results which they may mobilize for further research, analysis or communication within their own networks.

A more detailed description of each case and some initial research questions are included below.

III. Organization and Logistics

The BarCamp will last for two days from June 26-27. It will be hosted by ENSCI (Les Ateliers: École nationale supérieure de création industrielle) at 48 rue Saint Sabin, in the 11th arrondissement (Métro Chemin Vert / Métro Bréguet-Sabin), see here for Contact and location information. The schedule is as follows:

Jeudi 26 Juin

9h-9h30: Café
9h30-10h: Introduction
10h-10h45: Présentation des cas [Github, Risques climatiques, Pesticides]
10h45-11h: Formation des groupes
11h-13h: Travail en groupe [questions et taches]
13h-14h: Déjeuner
14h-18h30: Travail en groupe

Vendredi 27 juin

9h-9h30: Café
9h30-12h30: Travail en groupe
12:30-13h30: Déjeuner
13h30-16h30: Finalisation des travaux
16h30-17h: Préparation des présentations
17h-18h30: Présentation des travaux, discussion autour des suites éventuelles

The datasets for each case will be treated in advance by an “animator” who will do an initial cleaning of the data to help reduce grunt work during the BarCamp. At the start of the BarCamp, the animators will pitch the various cases and their datasets to the entire group of participants who will select with which group they would like to work. The role of the animator will be to help coordinate the work of the group, but the group should also collectively discuss and define the research questions that will motivate the two days of exploration as well as their own mode of functioning during the BarCamp.

Both the medialab and the platform CorText of IFRIS will assure the participation of a mix of their own members with skills in research, design or development. The organizers also plan to invite a limited number of external guests (issue experts, coders and designers) to ensure plenty of outside perspectives.

If you would like to attend we ask that you sign up so that we can plan accordingly for food. We also ask people to commit to attend the full two days, which will ensure the continuity of doing this type of collaboration in such a concentrated period.

If you have any questions, fell free to send them to the organisers at contact@digitalmethods-seminar.org.

IV. The Cases

Github Observatory

Case 1: A Github Observatory

We invite participants to join us for an exciting study of Github – a social networking and workflow platform for open source software development. Launched in 2008, Github hosts over 11.7 million active coding projects, making it the largest code repository in the world. We use the Github data, queried through its API, to assess the dynamics of project formation and the porosity of open source communities present on the platform. The API gives us access to an unprecedented level of fine-grain actions by participants of the site, from project managers to mere observers of projects. It offers access to various stable entities, from projects and their portfolios of contributors to contributors and their portfolios of projects joined over time. We take Github as a large scale natural experiment with millions of traces that can be queried.

Some initial questions that we will be exploring through the BarCamp include:

How do these communities form? Are all contributors defined at once or do they join gradually? Can projects be defined according to particular profiles? What is the distribution of these profiles across the platform? What can we say about the economics of different types of projects?
We will use digital markers — collected on Github — to navigate the various questions that such a trace-rich platform makes possible. We may look at this data from the angle of the specific question that animates a current project tracking the community of Russian computer scientists (RCS) active in Github. The RCS project maps Russian coders across the world and documents for the first time the relations between domestic and foreign computer scientists. Can we detect specific patterns of community dynamics and trust dimension in the group of Russian contributors? Can we detect the influence of mobility on these two elements – community dynamics and trust? In addition, how can we visualize these relationships, seizing on the flows of data in a dynamic, real-time fashion?
Eventually, we hope to articulate the morphology of the Github communities – who does what, when and with whom – and the types of codes produced by these communities. Beyond the motto of “open source” what else is being shared on Github?

Climate risk & US SEC

Case 2: Interpreting climate risk in SEC disclosure reports

All companies listed on U.S. stock exchanges are required to disclose information that might help investors evaluate the value of a company’s stock. These disclosures—called 10-K filings—include information about how much debt a company holds, whether they have any pending lawsuits, and what kind of other business risks they may face (competitive, physical, reputational or regulatory). In 2010, the Securities and Exchange Commission (the federal agency which oversees these disclosures) issued interpretive guidance applicable to all U.S. traded companies on what sorts of disclosures are relevant to assessing the risks and opportunities they face from climate change. However, since companies exercise their own judgment as to what to include and where to place it in the report (which can span 100 or more pages), textual disclosures relevant to climate risk are often scattered between numerous different sections of the disclosure reports and can be highly variable from company to company in terms of quality and volume. As a result, while a few groups have begun benchmarking different industry sectors around the quality of their disclosures, it is very difficult to get a holistic picture of the state of disclosures, or effectively compare disclosures within and across industry sectors—something which might allow investors to begin incorporating climate risk more robustly into their valuation models.

Working with a network of U.S. institutional investors, CookESG Research has built a series of algorithms that identify sections of the filings related to climate risk and superficially analyses the content into four basic categories. The results of this analysis are available to the public via a web-interface allowing filtered-searches of the database. This data, despite its richness, has yet to be analyzed with network analysis or text mining software.

This case will involve looking for patterns across various components of this highly structured dataset, including:

Comparison of reporting standards between companies within specific industry sectors, such as the oil and gas sector, or the agricultural sector.
Analysis is widely variable; HESS, for instance, has very detailed disclosure, while companies such as Chevron have abysmal disclosure.
Many companies analyze only pieces of risk, while leaving others unspoken; one task would be to see whether across all the reports there was a way of collating a complete industry “risk profile”.
Network analysis of the similarity of language in disclosures between industry groups. In financial filing, boilerplate text is often developed by a small number of law firms who then spread the text around. Identify this boilerplate, its distribution and its authors.
Tracing disclosure mentions of specific climate legislation in the US and corporate stances regarding these bills.
Linking this analysis to political contributions of energy companies to specific candidates (http://dirtyenergymoney.com/), and reviewing voting records of these politicians would provide a way of testing the position of these companies in disclosure statements to the stances of their political candidates on these various bills.
Develop an ontology of regulatory disclosures. For instance, take the case of electric utilities; where does “stationary source emission limits” come into their analysis and definition of climate risk?

Pesticide impacts

Case 3: Pesticide in circulation – expertise, controversy and social media

The use of pesticides in agricultural production is an ongoing source of epistemic tension in how to balance the use of chemicals for crop protection versus the protection of human health and environmental health from the impact of these same chemicals. Many strands of research in the life science deal with pesticides: both on the production of molecules that target specific pest and crops, but also on the bioaccumulation of pesticides residues in food, in environment or even in human body. The famous controversy about DDT and about PoP have shown the performativity of controversies about pesticides in the policy and regulatory space. This is still going on with the recent European directive about pesticides, which force member states to enact national plans for the reduction in use of particular pesticides. Associated to this regulatory regime, the issue of pesticide use is still very active in the public sphere, and appears to be a lever for the promotion of alternative agriculture that are having more and more recognition (organic farming, biodynamic, intensive agro-ecology). The nature of the debate about pesticides is thus expanding, reframing public understanding and adherence to industrial food chain.

This case will make use of a number of datasets already collected in the perspective of an ongoing research projected directed by Marc Barbier at INRA SenS on “Pesticides online: debates, polemics, controversies and the greening of agriculture.” These datasets include online debates gathered through web crawling of different web-forums on agriculture and chemicals, scientific citation data (from WoS, Cab and Medline), and a corpus of traditional press articles on the topic (Factiva).

During this BarCamp we expect to:

Trace the extension of the « use of pesticides » as both a challenge and a controversial space. An initial approach will be to try and follow the evolution of scientific studies of pesticides (both production and impact) and produce a phylomemic approach of the “regime of proof” of this science.
We also expect to map debates about pesticide use in the regional press (using Factiva), considered as a good level of analysis to have a follow-up of debates in various regions of France and possibly localize these debates on specific map.
Finally it looks particularly interesting to establish a mapping of web sites or blogs, thanks to crawlers, in order to establish what are the main actors involved and the contents of discussion and contentions. The legacy and effects of documentary and film might be a good entry-point for that purpose.

V. Institutional supports

The BarCamp is an event organized with the following institutional supports.

VI. Registration

Online registration is now closed. If you have any questions, fell free to send them to the organisers at contact@digitalmethods-seminar.org.