Azure Cosmos DB Part 3— Gremlin API

Jonathan
6 min readApr 19, 2021

Here comes the 3rd part of the Azure Cosmos DB learning journey. In this part, we would be taking a look at Gremlin API, which essentially is Graph API. What is the biggest outstanding characteristic of Gremlin/Graph API? Definitely the fact that instead of using relational database common column to associate 2 or more tables to find the right record, Graph database would be using 1-to-many relationships to get the right record.

photo credit: Graph database vs Relational database | by Tarun Manrai | Dev Genius | Medium

Some good use cases for Graph database.

Dynamic systems where the data topology is difficult to predict

Dynamic requirements that evolve with the business

Problems where the relationships in data contribute meaning and value

— Quoted from here

Some services that are commonly using Graph database.

Social networks/Customer 365: By combining data about your customers and their interactions with other people, you can develop personalized experiences, predict customer behavior, or connect people with others with similar interests. Azure Cosmos DB can be used to manage social networks and track customer preferences and data.

Recommendation engines: This scenario is commonly used in the retail industry. By combining information about products, users, and user interactions, like purchasing, browsing, or rating an item, you can build customized recommendations. The low latency, elastic scale, and native graph support of Azure Cosmos DB is ideal for these scenarios.

Geospatial: Many applications in telecommunications, logistics, and travel planning need to find a location of interest within an area or locate the shortest/optimal route between two locations. Azure Cosmos DB is a natural fit for these problems.

Internet of Things: With the network and connections between IoT devices modeled as a graph, you can build a better understanding of the state of your devices and assets. You also can learn how changes in one part of the network can potentially affect another part.

— Quoted from here

Graph databases are composed of vertices (synonym of nodes in Graph) and edges (the connection between 2 nodes). Both contain key-value properties. Just like the first 2 parts of this series, we would be list out all the common actions in the common ways of interactions.

Common Ways of Interactions

  • Built-in Data Explorer
  • Gremlin Console
  • Application in Python

Common Actions

  • Create Database
  • Update Database
  • Create Graph
  • Update Graph
  • Insert Data into Graph
  • Query Data from Graph

Built-in Data Explorer

Create Database and Create Graph

The name for collection in Gremlin API is called Graph, but the structure is the same as MongoDB API and SQL API.

  • Head over to “Data Explorer” on the left of the blade
  • Click on “New Graph” → “New Graph”
  • Put down the desired name for new database and graph. The same as MongoDB and SQL API, you would also need to select whether to auto scale/manual scale the request units (RU).

Expected result:

Update Database and Update Graph

Database throughput (RU) could be adjusted, but Graph (collection) could not be adjusted.

Insert Data into Database

With the commands provided by the official documentation, we could easily insert multiple vertices and edges.

  • vertices
g.addV('person').property('firstName', 'Thomas').property('lastName', 'Andersen').property('age', 44).property('userid', 1).property('pk', 'pk')
  • edges
g.V().hasLabel('person').has('firstName', 'Thomas').addE('knows').to(g.V().hasLabel('person').has('firstName', 'Mary Kay'))

We could also update a vertex.

g.V().hasLabel('person').has('firstName', 'Thomas').property('age', 45)

Traverse the graph

g.V().hasLabel('person').has('firstName', 'Thomas').outE('knows').inV().hasLabel('person')

Execute the commands inside the Graph tab and the result would be shown right below.

Query Data from Database

Within the same tab “Graph”, you could also execute queries to get the data needed.

g.V()

Gremlin Console

Installation

  • Download and install Java 8 from here
  • Download and install Gremlin Console from here

Connection

  • Head over to the path “<file path>\apache-tinkerpop-gremlin-console-3.4.10-bin\apache-tinkerpop-gremlin-console-3.4.10\conf”
  • There should be a file named “remote-secure.yaml”. Create a copy of the file and named “remote-secure.yaml.bak”
  • Go back to the original file “remote-secure.yaml”. Within the file, replace the last few lines with below
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# This configuration is meant to have Gremlin Server return
# text serialized objects. The server will toString()
# results giving a view into how scripts are executing.
#
# This file will work with:
# - gremlin-server-secure.yaml
##############################################################
hosts: [<azure cosmos db name>.gremlin.cosmos.azure.com]
port: 443
username: /dbs/<database name>/colls/<graph name>
password: your_primary_key
connectionPool: {
enableSsl: true
}
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0, config: { serializeResultToString: true }}
  • Head over to the path “<file path>\apache-tinkerpop-gremlin-console-3.4.10-bin\apache-tinkerpop-gremlin-console-3.4.10\bin”
  • For Windows OS, launch the console with “gremlin.bat”
  • Execute the following in the console
#To read the remote server configuration
- :remote connect tinkerpop.server conf/remote-secure.yaml
#To use the remote server console
- :remote console
  • Create Database and Update Database

Not applicable.

  • Create Graph and Update Graph

Not applicable.

  • Insert Data into Graph

Same as what users do in built-in data explorer.

  • Query Data from Graph

Same as what users do in built-in data explorer.

Application in Python

Prerequisite Actions:

  • Install Python 3.6+ from here
  • Install PIP with “python3 -m pip3 — version”
#Check the version with
- python3 --version
- pip3 --version

** If you are working on a Windows Subsystem For Linux 2 (WSL2) and you are encountering errors, chances are the time of WSL2 is not synchronized with the local machine. Execute “sudo ntpdate -b time.google.com” to temporarily mitigate the situation.

After that, follow through the official documentation to download the GitHub project. After changing to the right directory, check the file “connect.py”

#Change to the right directory
- cd "azure-cosmos-..."
#Edit the file
- nano connect.py
#Search for the string "client.Client"
- ctrl + w
- type in "client.Client"
- "alt + w" to keep on finding the next match
#Update the information within the file
...
client = client.Client('wss://<azure cosmos DB account name>.gremlin.cosmosdb.azure.com:443/','g',
username="/dbs/<database name>/colls/<graph name>",
password="<primary key>")
...

Install the requirements with PIP3 and execute the code.

#Install the requirements
- pip3 install -r requirements.txt
#Execute the code
python3 connect.py

That should be covering the 3 main ways to interact with Azure Cosmos DB — Gremlin API. Of course, there are many more details that would need to tested but they are all within the official documentation. One way to learn new technology is to always go through other people’s technical blog and Medium is a great platform for finding great resource! Happy learning!

--

--

Jonathan

Started my career as a consultant, moved to support engineer, service engineer and now a product manager. Trying to be a better PM systematically every day.