Glossary of Terms
Common Terminology
-
Dataset:
- A collection of related data organized in a structured format, often consisting of tables, rows, and columns.
-
Metadata:
- Information that provides details about other data. Metadata helps users understand the context, quality, and structure of the dataset, such as the date of creation, author, and data format.
-
API (Application Programming Interface):
- A set of rules and protocols that allows different software applications to communicate with each other. APIs define the methods and data formats that applications use to exchange information.
-
Endpoint:
- A specific URL where an API can access the resources it needs to perform operations, such as retrieving or sending data.
-
Authentication:
- The process of verifying the identity of a user or application. Authentication ensures that only authorized users or applications can access certain resources or perform specific actions.
-
Spatial Coverage:
- The geographic area that the data represents. It includes information on the locations, regions, or countries covered by the dataset.
-
Temporal Coverage:
- The time period that the data covers, usually specified by a start date and an end date.
Technical Jargon
-
Schema Validation:
- The process of ensuring that the data conforms to a predefined schema, including field types, required fields, and data formats. This validation helps maintain data integrity and consistency.
-
Data Cleaning:
- The process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset. Data cleaning ensures that the data is accurate, consistent, and usable for analysis.
-
Data Wrangling:
- The process of transforming and mapping data from one "raw" format into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes, such as analytics.
-
Referential Integrity:
- A property of data stating that all its references are valid. In databases, it ensures that relationships between tables remain consistent, such as ensuring that a foreign key value always points to an existing record in another table.
-
Quality Metrics:
- Standards used to measure the quality of data. Common quality metrics include accuracy, completeness, consistency, timeliness, validity, uniqueness, and integrity.
-
Two-Factor Authentication (2FA):
- An additional layer of security used to ensure that people trying to gain access to an online account are who they say they are. First, a user will enter their username and a password. Then, instead of immediately gaining access, they will be required to provide another piece of information.
-
OpenRefine:
- A powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
-
Pandas (Python Library):
- An open-source data analysis and data manipulation library for Python programming language. It offers data structures and operations for manipulating numerical tables and time series.
This glossary provides definitions for key terms and technical jargon that users might encounter while using the Open South platform. It helps ensure that all users have a common understanding of the terminology used within the platform and its documentation.