Firstly, what are we talking about here? What has been done, what are the challenges and what can be done to overcome them.
It helps to set the scene by saying what is FAIR data, or more importantly what purpose does this concept and its implementation fulfil?
We need to step back a little and think about the purpose of analytics. In the context of scientific and commercial research and development, analytics is all about supporting the assessment of hypotheses through digital means, as opposed to what you might call bench research. Or rather in support of the bench research that is done.
The reason that this support is needed is that – science is complicated – there is an understatement if ever there was one. It is at the point where no individual can understand enough of a broad range of subjects to undertake research on their own. It is all about collaborative endeavor. Which adds an additional overhead to the work.
We have seen a productivity crisis in research where more effort is expended for less benefit as time goes on. The low hanging fruit has been gathered, and the journey or harvesting (choose your metaphor) gets more difficult as we continue our work.
As they say, we stand on the shoulders of giants. We can read previous research and learn from it. In essence we need to look back into prior data generated from research and use that to inform our work.
At the same time, we have the opportunity and the obligation to work with computational tools that can support us in this work. Computational tools can, in essence, automate the kinds of processes that require human thought. Various kinds of thought processes can be automated, logical reasoning, association, pattern recognition, prediction from past experiences and so on.
The good news is that we have powerful tools. The bad news is that making use of these tools is not as easy as it first appears. This is in part due to the paradox that often what people find hard computation can accomplish easily; but what people think of as easy is hard for computation. Or looked at another way to is the challenge of representation. People are great at making representations from the world around them. So good that they often don’t see that they are looking at a representation and not the world. Take maps as an example. They are a simplification or an analogy for the world. Computation can only work on representations, not on the world directly.
You could spend years reading over the philosophy of cognition and representation, but one key insight is to think of a representation as one view of many. A perspective if you like. A key term used when people think of representation is the word ‘ontology’. Ontology was originally used to describe just this, a perspective. What is your ontology for this, and what is their ontology for it – they will be different because you see things slightly differently.
Why did we digress on this discussion? Well, the point is that computation can be used to automate thought processes over representations, and the representations matter because they define a view on the world. Likely one for which you have an hypothesis. As a theory or model has to be expressed in a representation.
Now what if you would like to gather other information with which to test your hypothesis? This information may have been gathered by other researchers who have a slightly different (but similar) world view. A similar, but slightly different ontology. Now you might need to translate between them.
First however, you might wish to find all relevant data. That is the first of our acronym. FAIR.
Now you can do this, as we say manually. But the power of the digital age is that you might wish to set a computational tool to automate this cognitive activity. So, ideally you want all information to be findable (usually based on relevant terms from your ontology, and perhaps other terms mapped to other ontologies as synonyms so your computational agent can find these too).
Ok, so your computational agent has found some data. You might wish to get hold of it. To access it. This may be straightforward, or it may be complicated. Ideally, you would like your computational agent to be able to gain access automatically. This is the second letter of our acronym FAIR.
Now, once you have the data you might find that it did indeed us a similar, but different, ontology as it’s representation. This could be a problem. You might not be able to get your new information aligned with your existing information, and so not be able to test your hypothesis. So, you need to identify the ontology used to encode this information, get hold of a copy of that ontology, and perhaps a mapping between that ontology and the one that you are using, and to run the mapping and normalise the data so that it can be analysed as one homogeneous data set.
Assuming you are successful, you might wish to record your analysis, the method, results and conclusions, along with the data you used, and perhaps a copy of the algorithm you used in your method.
In future, someone else might wish to revisit your work and extend it. And here is the last of the acronyms. R for reuse. If you have captured all of the above in such a way that the data can be found, accessed, and interoperated – using automated methods, then it can be resused and the virtuous cycle of enabling productive research can continue.
So, that covers - what is FAIR data, and what purpose does this concept and its implementation fulfil? – read more tomorrow