WeSenseIt Scalable Architecture and Infrastructure
WeSenseIt Platform enables flexible integration of all elements: physical and social sensors, models, e-collaboration applications, Decision Support tools as well as integration with external systems (GEOSS, external data sources etc.).
First version of WeSenseIt Platform was released at the end of first year of the project, including core WeSenseIt API and the Sensor Integration Layer. The platform integrates all physical and social sensors in Sensor Integration Layer and exposes API calls for retrieving sensor measurements and integration of models, e-collaboration applications Decision Support and external systems.
In the second year of the project we focused on implementing horizontal scalability as well as high availability in the Platform. WeSenseIt architecture is now prepared to meet Big Data challenges related to increasing number of sensors and huge volumes of data. The capabilities of the Platform can be increased on the fly by connecting additional hardware servers in the cloud, handling potentially – if necessary – petabytes of data. This was aimed at preparing the Platform for evaluation in larger scale: with new physical and social sensors, integrating data from new sources, providing reliable backend for new and extended mobile applications and involving citizens at large scale.
Implementation of WeSenseIt Data Storage Layer is based on Apache HBase (Hadoop database) – an open source distributed, scalable, big data store. HBase is a sub-project of the Apache Hadoop project and is used to provide real-time read and write access to big data. The primary objective of Apache HBase is the hosting of very large tables (billions of rows X millions of columns) atop clusters of commodity hardware.
In order to increase performance and scalability of the Platform we also improved the management of geospatial data. For this purpose, the following tools were used in WeSenseIt Sensor Integration Layer in combination with HBase:
- PostGIS – a spatial database extension for PostgreSQL object-relational database. It adds support for geographic/geospatial objects allowing location queries to be run in SQL. PostGIS uses spatial data types, spatial functions and indexes based on R-trees in order to effectively manage spatial information.
- Geohash – a latitude/longitude geocode system. It is a hierarchical spatial data structure which subdivides space into buckets of grid shape. When used in a database, the structure of geohashed data has two advantages. First, data indexed by geohash will have all points for a given rectangular area in contiguous slices. This is especially useful in database systems where queries on a single index are much easier or faster than multiple-index queries. Second, this index structure can be used for a quick-and-dirty proximity search – the closest points are often among the closest geohashe.
Software Mind prepared and configured cloud infrastructure , based on Apache CloudStack, for online services, evaluation and scalability/performance test. CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an on-premises (private) cloud offering, or as part of a hybrid cloud solution. In other words we configured an in-house version of Amazon EC2 cloud services (CloudStack provides an API that’s compatible with AWS EC2 and S3), which will allow us to re-use existing hardware and computational resources efficiently and flexibly: for research, tests and evaluation as well as supporting online production services in case of temporarily increased demand for processing power.
To support deploying, managing and using WeSenseIt API we provided interactive API documentation website based on Swagger, a framework for describing, producing, consuming, and visualizing web services. Interactive documentation UI is automatically generated, based on up-to-date API definition. The Swagger UI framework allows both developers and non-developers to interact with the API in a sandbox UI that gives clear insight into how the API responds to parameters and options. The documentation of methods, parameters, and models are tightly integrated into the server code, allowing APIs to always stay in sync. Current version of the API for retrieving sensor measurements includes the following calls:
- List of all sensors
- in a given region name (Delft, Doncaster, AAWA)
- in a given area / bounding rectangle
- Last sensor reading
- last reading, time of reading, unit of measurement, sensor longitude and latitude
- All historical measurements made by given sensor
- Readings made by a sensor in a given time period
- time between start date and end date
- by hydrological variable
- by mobility (mobile or fixed)
- List and management of entities and valid parameters used API calls:
- regions (Delft, Doncaster, AAWA)
Software Mind provided development infrastructure of the project: SVN – software versioning and a revision control system, Nexus – repository manager, Jenkins – continuous integration tool, JIRA – issue tracker, as well as maintained production server for Sensor Integration Layer dedicated to receive/share data from sensors in case studies for evaluations (in parallel, test environment is available for experimental/test versions of models and applications).