Google Sheets is one of the most popular online spreadsheets tool. It’s completely free and is accessible through a browser, making it a great option to work on data collaboratively.
In this blog, we’ll be showing you how to easily import data using the AI & Analytics Engine’s Google Sheets integration, so you can perform exploratory data analysis, use the data wrangling feature and build predictive ML models.
The data used as an example is one of the most popular introductory data science problems, and the most popular competition on Kaggle, the Titanic survival prediction exercise.
We’ve recently done a full walkthrough of the Titanic problem, which includes building ML models and downloading predictions, so have a read if you’re interested.
If you have your data stored as a CSV file, but want it in Google Sheets, you’ll have to begin by importing it into Google Sheets. Do this by clicking File → Import → Upload and select your file.
Once you're in the AI & Anaytics Engine, start by creating a project, and creating a ML app, which will take you to the App builder pipeline.
The first stage is Prepare data where you’ll be able to import your data from Google Sheets;
Click Add dataset
Click Import dataset
Click on the Apps tab
Click on Google Sheets
Paste the Google Sheets link into the box, ensuring that the sheet’s access control setting is set to "Anyone with the link"
After you do this, you’ll have a preview of your dataset, where you can define the data type of each column in the dataset.
And that’s it, importing data from Google Sheets using the Engine’s integration is really that simple.
With your data imported into the Engine, there several options for how you can proceed.
The first option is to perform exploratory data analysis by clicking the View analysis icon on your dataset. This will open a new tab where you can see distributions and summary statistics for each column, and see scatterplot relationships between pairs of columns.
The second option is to transform your data to get it ready for machine learning. Do this by selecting your dataset and then clicking Prepare data. Here you can create repeatable recipes of data transformation actions.
The final option is to continue in the App builder pipeline by selecting your dataset, and clicking Use as training dataset, where you will be able to build machine learning models and generate predictions.