Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook
Learn the basics of Jupyter Notebook and how to turn it into an interactive interpreter for Kotlin. You’ll also learn about Data Frames, an important data structure for data science applications. By Joey deVilla.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook
30 mins
- Kotlin? For Data Science?
- Introducing Jupyter Notebook
- Getting Started
- Creating Your First Notebook
- Understanding Code Cells
- Working With Markdown Cells
- Initializing krangl
- Diving into Data Frames
- Introducing Data Frames
- Creating a Data Frame from Scratch
- Getting the Data Frame’s Schema
- Getting the Data Frame’s Dimensions and Column Names
- Examining the Data Frame’s Columns
- Examining the Data Frame’s Rows
- Accessing Data Frame “Cells” by Column and Row
- Where to Go From Here?
Working With Markdown Cells
It’s time to look at Markdown cells, which contain content specified in Markdown.
Select the newest cell, which should be at the bottom of the notebook. In the toolbar near the top of the page, you’ll see a drop-down menu that displays its current selection as Code. Change that selection to Markdown:
You’ve designated the cell as a Markdown cell. This means it expects to have Markdown entered into it and that running the cell will cause its Markdown to be rendered.
Try it out. Enter the following into the Markdown cell:
# Welcome to *Jupyter Notebook!* ## This notebook runs a *Kotlin* kernel. This means the notebook will allow you to: * Enter **content** using [Markdown,](https://www.markdownguide.org/getting-started/) and * Enter **code** using [Kotlin.](https://kotlinlang.org/)
Run the cell. It now looks like this:
Double-click the cell. It switches from its fully rendered form back into Markdown, which you can edit further.
If you’re not familiar with it, Markdown is a way of formatting text as paragraphs, headers, hyperlinks, lists and so on, but without drowning you in the complexity that comes with working with HTML. Instead of tags, Markdown uses a limited set of characters to format text, making it easier to read and write than HTML.
In the Markdown above, you used the following Markdown formatting characters:
- Headings: Lines that start with at least one
#
are headings.#
denotes a level 1 heading,##
denotes a level 2 heading,###
denotes a level 3 heading and so on. - Bold and italic. You can specify text to mark as bold or italic using * characters in the following manner:
*italic*
,**bold**
and***bold italic***
. - Unordered list. An unordered list is a block of text where every line begins with *.
Markdown has many more features that are beyond the scope of this article. To learn more about them, consult the Markdown Guide’s Getting Started page.
Initializing krangl
Programming languages generally don’t have data science functionality built-in; instead, they get them from libraries. On Python, pandas is one of the most popular data science libraries, while on R, the preferred one is deplyr.
Kotlin has the krangl library, which takes its name from Kotlin library for data wrangling. Its design borrows heavily from two R libraries: deplyr and purrr. You’ll find krangl provides a subset of classes, methods and properties with the same or similar names as those you’ll find in these libraries. This will come in handy because there’s far more documentation and literature on those libraries than for krangl — at least for now.
It’s time to make krangl and all its features available to your notebook.
Create a new Kotlin notebook by using Jupyter Notebook’s Edit menu and selecting File → New Notebook → Kotlin.
Click the notebook’s title, which is located just to the right of the Jupyter logo at the top-left corner of the page (the title will probably be “Untitled2”). This allows you to rename the notebook: Enter My First Data Frame into the pop-up and click Rename to change the notebook’s name. The new title will replace the old one:
Changing a notebook’s name also changes its filename. If you look at the URL bar in your browser, you see the notebook’s filename is now My First Data Frame.ipynb (the .ipynb filename extension comes from Jupyter Notebook’s old name, iPython Notebook).
Enter the following into a new code cell and run it:
%use krangl
The cell should look like this for a few seconds …
And then it will look like this:
When the code in a cell is executing, the square brackets to the left of the cell contain an asterisk (*). In many cases, the code executes so quickly you don’t even see the asterisk.
The %use krangl
code you just ran isn’t Kotlin but a “magic” (short for “magic command”). Magics are commands that instruct the notebook’s kernel to perform a specific task. The %use
magic tells the Kotlin kernel to use one of its built-in libraries. It takes a few seconds to initialize, which is why you see the asterisk when running it.
With krangl initialized, it’s time to start working with data. The rest of this article will focus on data frames, which are the primary data structure in data science applications.
Diving into Data Frames
Introducing Data Frames
A data frame represents a table of data that’s organized into rows and columns. Each row represents a record or observation of some thing or happening, and each column represents a particular piece of data or property for a given row. Although you could use a two-dimensional array to store a data table, data frames have data-science-specific functionality.
The diagram below shows a small data frame containing data about different types of instant ramen:
In this data frame, each row represents a type of instant ramen. Each column represents a property of ramen, such as its brand, type or rating.
Although it’s possible to represent a table of data using arrays — either a two-dimensional array or an array of arrays — data frames are designed with data analysis in mind and come with methods and properties you would otherwise have to write yourself. They’re more like spreadsheets than two-dimensional arrays. Using data frames allows you to focus on analyzing and exploring data rather than on programming.
Creating a Data Frame from Scratch
krangl provides a class for data frames called DataFrame
. This class provides several ways to create a data frame, one of which is by defining it directly.
Enter the following into a new code cell and run it:
val df: DataFrame = dataFrameOf(
"language", "developer", "year_first_appeared", "preferred")(
"Kotlin", "JetBrains", 2011, true,
"Java", "James Gosling", 1995, false,
"Swift", "Chris Lattner et al.", 2014, true,
"Objective-C", "Tom Love and Brad Cox", 1984, false,
"Dart", "Lars Bak and Kasper Lund", 2011, true
)
This creates df
, an instance of DataFrame
that defines a table of programming languages used for mobile development. As you continue to read data science code and articles, you’ll see the variable df
over and over. It’s often used as a variable name for a data frame, just as i
is often used as a loop index variable.
Take a look at df
‘s contents. There are a couple ways to do this. The pure Kotlin way is to use DataFrame
‘s toString()
method, which returns a string containing the DataFrame
‘s row and column dimensions and its first 10 rows.
You’ll use toString()
indirectly via the print()
function. Run the following in a new code cell:
print(df)
This will produce the following output:
A DataFrame: 5 x 4 language developer year_first_appeared preferred 1 Kotlin JetBrains 2011 true 2 Java James Gosling 1995 false 3 Swift Chris Lattner et al. 2014 true 4 Objective-C Tom Love and Brad Cox 1984 false 5 Dart Lars Bak and Kasper Lund 2011 true
DataFrame
also has a print()
method that has the same effect.
If you want output formatted even more nicely, let the notebook do the work. Enter the following into a new code cell:
df
As with anything else that returns or evaluates to a value, Jupyter Notebook will display df
‘s value. The interesting twist with types like krangl’s DataFrame
is they take advantage of “hooks” provided by Jupyter Notebook. The end result is that when the notebook displays a DataFrame
‘s contents, it does so in the form of a nicely formatted table:
This sort of feature is helpful when you’re submitting a research paper as a Jupyter Notebook and want it to have readable tables.