March 18, 2017

Almost time for GSoC Applications!

Your chance to join the DevoWorm group is almost upon us. If you are a student, the Google Summer of Code (GSoC) is a good opportunity to gain programming experience. Applications are being accepted from March 20 to April 3. If selected, you will join the DevoWorm group, and also have the chance to network with people from the OpenWorm Foundation and the INCF.

The best approach to a successful application is to discuss your skills, provide an outline of what you plan to do (which should resemble the project description), and then discuss your approach to solving the problems at hand. We are particularly interested in a demonstration of your problem-solving abilities, since many people will apply with a similar level of skill. You can find an application template in outline form here.

You can apply to work on one of two DevoWorm projects: "Physics-based Modeling of the Mosaic Embryo in CompuCell3D" or "Image processing with ImageJ (segmentation of high-resolution images)". If you have any questions, comment in the discussions or contact me directly.

March 15, 2017

A Tree of Deeper Experiences -- the Authorship Tree

One of the most difficult aspects of academic publishing with multiple authors is in determining the order of authorship. In many fields, authorship order is the key to job promotion. Unfortunately, these conventions vary field, while the criteria for authorship slots often varies by research group. Since a responsible accounting of contributions are key to determining authorship and authorship order [1], it is worth considering multiple possibilities for conveying this information.

Example of an Authorship list (with affiliations)

A mathematics or computer science researcher might also see the problem as one of choosing the proper representational data structure. The authorship order, no matter how determined, is a 1-dimensional queue (ordered list). Even though some publishers (such as PLoS) allow for footnotes (an inventory of author contributions), there is still little room for nuance.

Example from "The Academic Family Tree"

But is there a better way? Academic genealogies provide one potential answer. A typical genealogy can be thought of as a 1-dimensional order, from mentor to student. In reality, however, an academic have multiple mentors, influenced by a number of predecessors. The construction of academic family trees [2] is one step in this direction, turning the 1-dimensional graph into a 2-dimensional one.

Picture of the Authorship tree cover. COURTESY: "The Giving Tree" by Shel Silverstein

This is why Orthogonal Lab has just published a hybrid infographic/paper called the The Authorship Tree [3]. This is a working document, so suggestions are welcome. The idea is to not only determine the relative scope of each contribution, but also to graphically represent the interrelationships between authors, ideas, and scope of the contributions.

As we can see from the example below, this includes not only our authors, but also people from the acknowledgements, funders, reviewers, authors of important papers/methods, and funders. While the ordering of branches along the stem suggests an authorship order, they are actually ranked according to their degree of contribution [4]. To this end, there can be equivalent amounts of contribution, as well as inclusion of minor contributors not normally included in an authorship list.

Example of an authorship tree (derived from original 1-D author list).

[1] Cozzarelli, N.R. (2004). Responsible authorship of papers in PNAS. PNAS, 101(29), 10495.

[2] David, S.V. and Hayden, B.Y. (2012). Neurotree: A Collaborative, Graphical Database of the Academic Genealogy of Neuroscience. PLoS One, 7(10), e46608. doi:10.1371/journal.pone.0046608.

[3] Orthogonal Lab (2017). The Authorship Tree. Figshare, doi:10.6084/m9.figshare.4731913.

[4] For more on the point system convention, please see: Venkatraman, V. (2010). Conventions of Scientific Authorship. Science Issues and Perspectives, doi:10.1126/science.caredit.a1000039.

March 4, 2017

Open Data Day Activities

Today is International Open Data Day, which was first proposed in 2010. To do my part, we will discuss a few open data-related items. Namely, what can you do to make this day a success?

Logo of the Open Knowledge Foundation (based in London), who offer a host of Open Data Day acitivities.

1) You can host some of your unpublished data (whether they are linked to publications or not) at an open data repository. You can do this through a general repository such as Dryad or Figshare, or a specialized repository such as Open fMRI [1].

* another part of publishing data is the need for annotation and other metadata. This is a barrier to opening up datasets, but the benefits of doing so may outweigh the initial investments [2].
2) You can join a open access communities such as, a new social media network that allows people to share datasets of all types and sizes.

3) You can commit to creating more systematic descriptions of your research methods (e.g. the things you do to create data). This can be done by creating a set of digital notes or protocol descriptions [3], and making them open through Jupyterhub and [4], respectively.

4) You can host your own virtual Hackathon. Unsure as to how you might do this? Then you can earn any (or all) in a series of three badges (Hackathon I, Hackathon II, Hackathon III) created in conjunction with the Open Worm Foundation.

5) You can petition or get involved with municipal and state/provincial governments to ensure their committment to open public data.

Of course, there are other things you can do, and more innovation is needed in this area. Have some ideas or planning an event of your own. Let me know, and I will invite you to the Orthogonal Lab's new Slack channel on Open Science.

[1] This choice, of course, depends on the field in which you are working. I used this example because fMRI data seems to have good community support for data sharing. Consult the Open Access Directory to learn more about the specifics for various disciplines.

For more information about data sharing in the field of neuroimaging, please see: Iyengar, S. (2016). Case for fMRI Data Repositories. PNAS, 113(28), 7699-7700.

[2] Based on a paper recently posted to the bioRxiv, and based on some material from a recent talk. For more information, please see: Alicea, B. (2016). Data Reuse as a Prisoner's Dilemma: the social capital of open science. bioRxiv, doi:10.1101/093518.

[3] Olson, R. (2012). A short demo on how to use IPython Notebook as a research notebook. Randal S. Olson blog, May 12.

[4] In terms of witing better and more accessible protocols, please see the following examples: (2017). How to make your protocol more reproducible, discoverable, and user-friendly.
February 25.

Daudi, A. How to Write an Easily Reproducible Protocol. American Journal Experts, http://www.aje.
com/en/arc/how-to-write-an-easily-reproducible-protocol/, Accessed February 27, 2017.

February 21, 2017

New Orthogonal Laboratory Methods

Lately I have been incorporating two new tools into my research program's [1] infrastructure. One is a software tool with community support, and the other is a development of my own. 

The first addition is the Jupyter Notebook (sometimes called the iPython Notebook, as it is based on this platform). The Jupyter Notebook allows us to build repositories of methods, notes, code, and data analyses in an integrated manner. Jupyter Notebooks can be rendered in Github, making them freely accessible and distributable. For example, the DevoWorm project already has several notebooks hosted at Github. The long-term goal is to create notebooks for typical research activities, and using them for a host of purposes, from a Wiki-like instructional manual to supplemental materials for publications [2].

Jupyter Notebooks (example)

The other is a pipeline for project management with the goal of increasing participation and success in research. The idea is one that I have been bouncing around in my head based on my involvement with the OpenWorm Foundation community committee and personal experience. This could be a way to encourage more underrepresented and "high-risk" researchers to advance their work [3]. It is based on two exceedingly obvious principles: failure is not a breaking point for any research trajectory, and projects themselves should be defined in a bottom-up fashion (building on previous successes and experiences) [4]. Hopefully, this pipeline works well in implementation.

Building a Research Group Philosophy

UPDATE (2/22): I failed to include a snapshot of the Orthogonal Laboratory Slack team (currently with an n of 1). Slack is fast becoming a popular tool for laboratory management [5, 6], particularly those that are partially or fully virtual.

[1] I am in the process of turning Orthogonal Research into Orthogonal Laboratory. Currently it is a group of one (and a few collaborators). I am currently looking for an academic home, so putting the tools needed to scale up is worth the investment in time. More on this initiative later.

[2] Brown, C.T. (2017). Topics and concepts I'm excited about (Paper of the Future). Living in an Ivory Basement blog, January 9.

[3] the very notion of "high-risk research" is biased toward a fear of failure. Considering what is usually thrown into that bucket, "high-risk research" is a statement of cultural values more than an inherent risk. Removing the industrial, one-size-fits-all aspect of research might be a way to mitigate risk.

[4] sometimes you get lucky and get to define a project right out of the box. But in doing so, projects often end up exhibiting a hodgepodge quality that makes them seem unfocused.

[5] Perkel, J.M. (2017). How Scientists Use Slack. Nature, 541, 123-124.

[6] Washietl, S. (2016). 6 Ways to Streamline Communication in Your Research Group Using Slack. Paperpile blog, April 12.

February 16, 2017

A Peripheral Darwin Day post, but Centrality in his Collaboration Graph

Happy Darwin Day-ish! COURTESY: Kapil Bhagat

Having not decided on what to post for Darwin Day 2017 in advance (and thus being late to the party with my annual post), I will be taking a rather circular approach to this post. I recently read a blog post on a TEDMED talk by Artem Kazneechev [1] on how theorists offer opportunities for collaboration across multiple research domains and existing research communities. The most extreme case is that of Paul Erdos, for whom the term "Erdos number" was coined [2]. The Erdos number defines a degree of association on a collaboration graph between a given author and Erdos as defined through co-authorship [3]. The role of theorists in such collaboration graphs is intriguing, and involves their role as hubs (highly-connected nodes) in a scale-free network topology [4]. In terms of the scientific community, such hubs often serve as connectors between disciplinary groups and sub-communities.

Darwin at the center (a hub with a high degree of centrality) of a hypothetical collaboration graph. Image is actually of an evolutionary amplifier, a computational structure from Evolutionary Graph Theory. Image of Darwin is from Dinochick's blog.

As Kazneechev [1] points out, sometimes one need not be as prolific as Paul Erdos to serve as a connector. Henri Poincare was a bit less prolific, and certainly did not live out of a suitcase, but serves as a scientific connector nonetheless. In fact, all theorists are at an advantage in this regard. This makes me wonder what a collaboration graph centered on Charles Darwin would look like. While I do not have the neccessary data, I would imagine it would quite different from Erdos' graph. This is because Darwin (as far as I know) did not publish collaborative papers. However, a citation network [5] in which documents rather than scientists serve as the nodes might better capture Darwin's role as an influencer, and thus partially recapitulate the topology of an Erdos-based collaboration graph.

Lately, I have been doing some unfocused research on polymathy [6]. One of the things that has fascinated me is the distinction between "domain-specific" knowledge and "general" knowledge, particularly as it relates to specialization. One criticism of modern science is that it suffers from hyper-specialization. The trend towards hyper-specialization has been constant over historical time, and now contrasts sharply with the scientists of the 16th and 17th century. This trend has been countered in a number of ways, particularly through interdisciplinary initiatives. Yet all too often, interdisciplinarity is reduced to groups of specialists gathered in the same room talking past one another.

A "T" shape skillset in terms of employment skills and educating talent. COURTESY: T-Summit 2014.

I am interested in taking a landscape model approach to modeling polymathy as a function of expertise and semantic specialization. In getting there, we have to understand the relationships (various dimensions) of specialization and generalized knowledge. According to the education and tech literature, the traditional polymath can be modeled as a "T". In fact, the T-shaped skillset is back in vogue in some fields (e.g. design, project management). It is somewhat difficult to make the leap from abstract skills to specific facts and other knowledge that facilitates (or constrains) scientific collaboration. 

Components of the "I" shape in terms of academic influence and expertise. More information on this work to come.

To help this along, I have worked out an ontological and semantic model of scientific expertise. In the figure above, I show the bivariate model as a shape representing the relative "depth and "breadth" of a particular style of scientific practice. While there are "Is" (specialists) and "Ts" (generalists with a single specialty), there are also "combs" (generalists with multiple specialties) and "dashes" (pure generalists). "Combs" are most analogous to the traditional polymath, but it is interesting to ask where Charles Darwin (and other theorists) would fit into this type of scheme. 

While Darwin has shaped the thinking of many scholars in multiple fields (both traditional and upstart) over the past 150 years, he was also influenced by a variety of thinkers. Even more than a scientific specialist, the essence of theorists can be captured by multiple-input, multiple-output (MIMO) graphs of major ideas. This can extend even beyond the lifetime of the scientist. The following graphical example of influencers and the influenced (inputs and outputs) is from Semantic Scholar, and shows Charles Darwin's position in a semi-directed citation network within the Computer Science community.

Charles Darwin's academic influence as MIMO graph. See profile for details on how graph is computed.

[1] Kazneechev, A. (2012). Theorists as Connectors: from Poincare to mathematical medicine. Theory, Evolution, and Games Group blog, November 4.

[2] Newman, M.E.J. (2004). Who Is the Best Connected Scientist? A Study of Scientific Coauthorship Networks. Lecture Notes in Physics, 650, 337–370.

[3] Alicea, B. (2011). Academic Connectivity and the Future of Scientific Ideas. Synthetic Daisies blog, September 9.

[4] Newman, M.E.J. (2001). The structure of scientific collaboration networks. PNAS, 98(2), 404-409.

[5] More information about citation networks and their usefulness to the practice of science can be found in: Editorial (2010). On citing well. Nature Chemical Biology, 6, 79.

[6] A few popular readings on polymathy: Arbesman, S. (2013). Let's Bring the Polymath -- and the Dabblers -- Back. Wired, December 13 AND Mazie, S. How to be a Polymath. Big Think blog.