Databases
Warppgrapher supports several database back-ends for graph data:
- Apache Tinkerpop
- AWS Neptune (Gremlin variant)
- Azure Cosmos DB (Gremlin variant)
- Neo4J
Using each of the databases requires correctly selecting the appropriate crate feature and setting up environment variables to allow Warpgrapher to connect with the database.
Regardless of database, export an environment variable to control the size of the database connection pool:
export WG_POOL_SIZE=8
If the WG_POOL_SIZE variable is not set, Warpgrapher defaults to a pool the same size as the
number of CPUs detected. If the number of CPUs cannot be detected, Warpgrapher defaults to a pool
of 8 connections.
Gremlin-Based Databases
For all gremlin-based databases -- Apache Tinkerpop, AWS Neptune, and Azure Cosmos DB -- the following environment variables control Warpgrapher behavior:
- WG_GREMLIN_HOST is the host name for the database to which to connect.
- WG_GREMLIN_READ_REPICA provides a separate host name for read-only replica nodes, if being used for additional scalability. If not set, the read pool connects to the same host as the read/write connection pool.
- WG_GREMLIN_PORT provides the port to which Warpgrapher should connect.
- WG_GREMLIN_USER is the username to use to authenticate to the database, if required.
- WG_GREMLIN_PASS is the password to use to authenticate to the database, if required.
- WG_GREMLIN_USE_TLS is set to
trueif Warpgrapher should connect to the database over a TLS connection, andfalseif not using TLS. Defaults totrue. - WG_GREMLIN_VALIDATE_CERTS is set to
trueif Warpgrapher should validate the certificate used for a TLS connection, andfalse. Defaults totrue. Should only be set to false in non-production environments. - WG_GREMLIN_BINDINGS is set is
trueif Warpgrapher should use Gremlin bindings to send values in queries (effectively query parameterization), andfalseif values should be sanitized and sent inline in the query string itself. Defaults totrue. - WG_GREMLIN_LONG_IDS is set to
trueif Warpgrapher should use long integers for vertex and edge identifiers. Iffalse, Warpgrapher uses strings. Defaults tofalse. - WG_GREMLIN_PARTITIONS is set to
trueif Warpgrapher should require a partition ID, and false if Warpgrapher should ignore or omit partition IDs. Defaults tofalse. - WG_GREMLIN_SESSIONS is set to
trueif Warpgrapher mutations should be conducted within a single Gremlin session, which in some databases provides transactional semantics, andfalseif sessions should not be used. Defaults tofalse. - WG_GREMLIN_VERSION may be set to
1,2, or3, to indicate the version of GraphSON serialization that should be used in communicating with the database. Defaults to3.
Example configurations for supported databases are shown below. In many cases, some environment variables are omitted for each database where the defaults are correct.
Apache Tinkerpop
Add Warpgrapher to your project config:
cargo.toml
[dependencies]
warpgrapher = { version = "0.9.0", features = ["gremlin"] }
Then set up environment variables to contact your Gremlin-based DB:
export WG_GREMLIN_HOST=localhost
export WG_GREMLIN_PORT=8182
export WG_GREMLIN_USER=username
export WG_GREMLIN_PASS=password
export WG_GREMLIN_USE_TLS=true
export WG_GREMLIN_VALIDATE_CERTS=true
export WG_GREMLIN_LONG_IDS=true
The WG_GREMLIN_CERT environment variable is true if Warpgrapher should ignore the validity of
certificates. This may be necessary in a development or test environment, but should always be set
to false in production.
If you do not already have a Gremlin-based database running, you can run one using Docker:
docker run -it --rm -p 8182:8182 tinkerpop/gremlin-server:latest
To use an interactive gremlin console to manually inspect test instances, run
docker build -t gremlin-console -f tests/fixtures/gremlin-console/Dockerfile tests/fixtures/gremlin-console
docker run -i --net=host --rm gremlin-console:latest
In the console, connect to the remote graph:
:remote connect tinkerpop.server conf/remote.yaml
:remote console
AWS Neptune
Add Warpgrapher to your project config:
cargo.toml
[dependencies]
warpgrapher = { version = "0.9.0", features = ["gremlin"] }
Then set up environment variables to contact your Neptune DB:
export WG_GREMLIN_HOST=[neptune-rw-hostname].[region].neptune.amazonaws.com
export WG_GREMLIN_READ_REPLICAS=[neptune-ro-hostname].[region].neptune.amazonaws.com
export WG_GREMLIN_PORT=443
export WG_GREMLIN_USE_TLS=true
export WG_GREMLIN_VALIDATE_CERTS=true
export WG_GREMLIN_BINDINGS=false
export WG_GREMLIN_SESSIONS=true
The WG_GREMLIN_CERT environment variable is true if Warpgrapher should ignore the validity of
certificates. This may be necessary in a development or test environment, but should always be set
to false in production.
Azure Cosmos DB
Add Warpgrapher to your project config:
cargo.toml
[dependencies]
warpgrapher = { version = "0.9.0", features = ["gremlin"] }
Then set up environment variables to contact your Cosmos DB:
export WG_GREMLIN_HOST=*MY-COSMOS-DB*.gremlin.cosmos.azure.com
export WG_GREMLIN_PORT=443
export WG_GREMLIN_USER=/dbs/*MY-COSMOS-DB*/colls/*MY-COSMOS-COLLECTION*
export WG_GREMLIN_PASS=*MY-COSMOS-KEY*
export WG_GREMLIN_USE_TLS=true
export WG_GREMLIN_VALIDATE_CERTS=true
export WG_GREMLIN_PARTITIONS=true
export WG_GREMLIN_VERSION=1
Note that when setting up your Cosmos database, you must configure it to offer a Gremlin graph API.
Note also that you must set your partition key to be named partitionKey.
Be advised that Gremlin traversals are not executed atomically within Cosmos DB. A traversal may fail part way through if, for example, one reaches the read unit capacity limit. See this article for details. The workaround proposed in the article helps, but even idempotent queries do not guarantee atomicity. Warpgrapher does not use idempotent queries with automated retries to overcome this shortcoming of Cosmos DB, so note that if using Cosmos, there is a risk that a failed query could leave partially applied results behind.
Neo4J
Add Warpgrapher to your project config:
[dependencies]
warpgrapher = { version = "0.9.0", features = ["neo4j"] }
Then set up environment variables to contact your Neo4J DB:
export WG_NEO4J_HOST=127.0.0.1
export WG_NEO4J_READ_REPLICAS=127.0.0.1
export WG_NEO4J_PORT=7687
export WG_NEO4J_USER=neo4j
export WG_NEO4J_PASS=*MY-DB-PASSWORD*
Note that the WG_NEO4J_READ_REPLICAS variable is optional. It is used for Neo4J cluster
configurations in which there are both read/write nodes and read-only replicas. If the
WG_NEO4J_READ_REPLICAS variable is set, read-only queries will be directed to the read replicas,
whereas mutations will be sent to the instance(s) at WG_NEO4J_HOST.
If you do not already have a Neo4J database running, you can run one using Docker:
docker run -e NEO4JAUTH="${WG_NEO4J_USER}:${WG_NEO4J_PASS}" neo4j:4.1