Detecting data errors is often done using non-specific numerical and statistical tools; for example, by excluding all outliers, defined as values over two standard deviations from the mean.
An important tool in the evaluation of result validity and relevance is the analysis of coverage and bias. Data are collected in different geographical regions, depths, and seasons, and using different instruments. When presenting results, one must either correct them for inherent biases, exclude under-represented partitions, or provide a list of caveats and analyses regarding the coverage and bias with respect to the general distribution over each dimension.
Oceanographic research relies heavily on the collection, analysis, and interpretation of data. Many papers in the various domains comprising oceanography begin with a statement such as “the amount of data available is steadily increasing”. However, to the best of our knowledge, no one has taken a longitudinal approach to reviewing the availability, coverage, and amount of research data collected and published. The following poster describes our ongoing work, which aims to systematically collect and analyze mentions and records of data collection throughout the history of oceanographic science. We wish to answer quantitatively questions such as the following. How much of the data collected is still available for analysis? In which disciplines is data available over different regions of the ocean and in which are there shortages of available data. We intend to make public a web-based data analysis tool, allowing faceted exploration of questions such of these over the results of our work.EMODNet_2021_University_Of_Haifa_poster