Today’s digital business is more data driven than ever. And as AI and Machine Learning are increasingly used to make business-critical decisions, it’s important that organisations focus on the veracity and context of the data that’s fuelling insights — looking beyond the amount of data that is able to be collected.

Embracing big data in a manner that yields better business results within a practical timeframe remains a challenge for a lot of businesses. To help organisations deliver impactful results, DiUS’ approach is to avoid over-engineering a solution by starting out with a data lake to store everything.

Our next-generation analytics services range from developing data strategies to data engineering, right through to developing custom algorithms to deliver actionable insights. We help businesses with:

  • Developing a data strategy and building a culture of rapid ideation and experimentation.
  • Cleaning and structuring disparate data sources as well as applying feature engineering.
  • Developing cloud-based, scalable models that efficiently process huge datasets to deliver faster time to insights.
  • Developing custom algorithms that provide trend analysis and predictions from large data sets.
  • Apply a UX lens to map customer journeys and provide actionable insights for product development.
  • Implementing big data or streaming technologies such as Apache Spark, Apache Kafka or Amazon Kinesis.

We know that starting small and picking a problem where the data is available to help run a proof of value to deliver learnings to the organisation quickly is more likely to deliver success. Once an organisation has started, it’s easier to work with them to move the solution into a core application and then tackle the next pain point.

As data propels the modern business forward, increasingly we find that our next-generation analytics services are key aspects of not just our Machine Learning projects and our Internet of Things but also our custom application development projects.

Toolbox

Apache Spark
We are familiar with Apache Spark, the open source, general-purpose distributed computing engine used for processing and analysing a large amount of data.

Apache Kafka
We’ve used Apache Kafka with Apache Storm to successfully do high speed matching to aggregate data pipelines from distributed applications to produce a centralised and real-time data feed.

AWS
It’s our standard approach to leverage the AWS Platform to build a DevOps, CI/CD and automated infrastructure capability as part of building an end-to-end digital solution or creating a proof of value and then building it out into a scalable digital product. Specifically, for next-generation analytics we use:
AWS S3 for storage, potentially instead of a data warehouse, before a data warehouse is introduced or in addition to a data warehouse.AWS Athena to query files/data directly from S3. Glue Crawlers to crawl and catalogue data in S3 and other sources. This can then create ‘tables’ that Athena can query.AWS Lambda using Node.js for serverless data processing.Amazon DynamoDB to model a dataset and ingest unstructured data to support key value lookups.Amazon Kinesis to collect, process, and analyse real-time, streaming data.