Getting Started with Helm Charts (ScalarDB Analytics with PostgreSQL)
This guide explains how to get started with ScalarDB Analytics with PostgreSQL by using a Helm Chart in a Kubernetes cluster as a test environment. In addition, the contents of this guide assume that you already have a Mac or Linux environment set up for testing. Although minikube is mentioned, the steps described should work in any Kubernetes cluster.
What you will create
You will deploy the following components in a Kubernetes cluster:
+-------------------------------------------------------------------------------------------------------------------------------------------+
| [Kubernetes cluster] |
| |
| [Pod] [Pod] [Pod] |
| |
| +------------------------------------+ |
| +---> | ScalarDB Analytics with PostgreSQL | ---+ +-----------------------------+ |
| | +------------------------------------+ | +---> | MySQL ("customer" schema) | <---+ |
| | | | +-----------------------------+ | |
| +-------------+ +---------+ | +------------------------------------+ | | | |
| | OLAP client | ---> | Service | ---+---> | ScalarDB Analytics with PostgreSQL | ---+---+ +---+ |
| +-------------+ +---------+ | +------------------------------------+ | | | | |
| | | | +-----------------------------+ | | |
| | +------------------------------------+ | +---> | PostgreSQL ("order" schema) | <---+ | |
| +---> | ScalarDB Analytics with PostgreSQL | ---+ +-----------------------------+ | |
| +------------------------------------+ | |
| | |
| +-------------+ | |
| | OLTP client | ---(Load sample data with a test OLTP workload)-----------------------------------------------------------------------+ |
| +-------------+ |
| |
+-------------------------------------------------------------------------------------------------------------------------------------------+
Step 1. Start a Kubernetes cluster
First, you need to prepare a Kubernetes cluster. If you're using a minikube environment, please refer to the Getting Started with Scalar Helm Charts. If you have already started a Kubernetes cluster, you can skip this step.
Step 2. Start MySQL and PostgreSQL pods
ScalarDB including ScalarDB Analytics with PostgreSQL can use several types of database systems as a backend database. In this guide, you will use MySQL and PostgreSQL.
You can deploy MySQL and PostgreSQL on the Kubernetes cluster as follows:
-
Add the Bitnami helm repository.
helm repo add bitnami https://charts.bitnami.com/bitnami
-
Update the helm repository.
helm repo update bitnami
-
Deploy MySQL.
helm install mysql-scalardb bitnami/mysql \
--set auth.rootPassword=mysql \
--set primary.persistence.enabled=false -
Deploy PostgreSQL.
helm install postgresql-scalardb bitnami/postgresql \
--set auth.postgresPassword=postgres \
--set primary.persistence.enabled=false -
Check if the MySQL and PostgreSQL pods are running.
kubectl get pod
You should see the following output:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
mysql-scalardb-0 1/1 Running 0 3m17s
postgresql-scalardb-0 1/1 Running 0 3m12s
Step 3. Create a working directory
Since you'll be creating some configuration files locally, create a working directory for those files.
mkdir -p ~/scalardb-analytics-postgresql-test/
Step 4. Set the versions of ScalarDB, ScalarDB Analytics with PostgreSQL, and the chart
Set the following three environment variables. If you want to use another version of ScalarDB and ScalarDB Analytics with PostgreSQL, be sure to set them to the versions that you want to use.
You must use the same minor versions (for example, 3.10.x) of ScalarDB Analytics with PostgreSQL as ScalarDB, but you don't need to make the patch versions match. For example, you can use ScalarDB 3.10.1 and ScalarDB Analytics with PostgreSQL 3.10.3 together.
SCALARDB_VERSION=3.10.1
SCALARDB_ANALYTICS_WITH_POSTGRESQL_VERSION=3.10.3
CHART_VERSION=$(helm search repo scalar-labs/scalardb-analytics-postgresql -l | grep -e ${SCALARDB_ANALYTICS_WITH_POSTGRESQL_VERSION} | awk '{print $2}' | sort --version-sort -r | head -n 1)
Step 5. Run OLTP transactions to load sample data to MySQL and PostgreSQL
Before deploying ScalarDB Analytics with PostgreSQL, run the OLTP transactions to create sample data.
-
Start an OLTP client pod in the Kubernetes cluster.
kubectl run oltp-client --image eclipse-temurin:8-jdk-jammy --env SCALARDB_VERSION=${SCALARDB_VERSION} -- sleep inf
-
Check if the OLTP client pod is running.
kubectl get pod oltp-client
You should see the following output:
$ kubectl get pod oltp-client
NAME READY STATUS RESTARTS AGE
oltp-client 1/1 Running 0 17s -
Run bash in the OLTP client pod.
kubectl exec -it oltp-client -- bash
After this step, run each command in the OLTP client pod.
-
Install the git and curl commands in the OLTP client pod.
apt update && apt install -y curl git
-
Clone the ScalarDB samples repository.
git clone https://github.com/scalar-labs/scalardb-samples.git
-
Go to the directory
scalardb-samples/multi-storage-transaction-sample/
.cd scalardb-samples/multi-storage-transaction-sample/
pwd
You should see the following output:
# pwd
/scalardb-samples/multi-storage-transaction-sample -
Create a configuration file (
database.properties
) to access MySQL and PostgreSQL in the Kubernetes cluster.cat << 'EOF' > database.properties
scalar.db.storage=multi-storage
scalar.db.multi_storage.storages=storage0,storage1
# Storage 0
scalar.db.multi_storage.storages.storage0.storage=jdbc
scalar.db.multi_storage.storages.storage0.contact_points=jdbc:mysql://mysql-scalardb.default.svc.cluster.local:3306/
scalar.db.multi_storage.storages.storage0.username=root
scalar.db.multi_storage.storages.storage0.password=mysql
# Storage 1
scalar.db.multi_storage.storages.storage1.storage=jdbc
scalar.db.multi_storage.storages.storage1.contact_points=jdbc:postgresql://postgresql-scalardb.default.svc.cluster.local:5432/postgres
scalar.db.multi_storage.storages.storage1.username=postgres
scalar.db.multi_storage.storages.storage1.password=postgres
scalar.db.multi_storage.namespace_mapping=customer:storage0,order:storage1
scalar.db.multi_storage.default_storage=storage1
EOF -
Download Schema Loader from ScalarDB Releases.
curl -OL https://github.com/scalar-labs/scalardb/releases/download/v${SCALARDB_VERSION}/scalardb-schema-loader-${SCALARDB_VERSION}.jar
-
Run Schema Loader to create sample tables.
java -jar scalardb-schema-loader-${SCALARDB_VERSION}.jar --config database.properties --schema-file schema.json --coordinator
-
Load initial data for the sample workload.
./gradlew run --args="LoadInitialData"
-
Run the sample workload of OLTP transactions. Running these commands will create several
order
entries as sample data../gradlew run --args="PlaceOrder 1 1:3,2:2"
./gradlew run --args="PlaceOrder 1 5:1"
./gradlew run --args="PlaceOrder 2 3:1,4:1"
./gradlew run --args="PlaceOrder 2 2:1"
./gradlew run --args="PlaceOrder 3 1:1"
./gradlew run --args="PlaceOrder 3 2:1"
./gradlew run --args="PlaceOrder 3 3:1"
./gradlew run --args="PlaceOrder 3 5:1"
-
Exit from OLTP client.
exit
Step 6. Deploy ScalarDB Analytics with PostgreSQL
After creating sample data via ScalarDB in the backend databases, deploy ScalarDB Analytics with PostgreSQL.
-
Create a custom values file for ScalarDB Analytics with PostgreSQL (
scalardb-analytics-postgresql-custom-values.yaml
).cat << 'EOF' > ~/scalardb-analytics-postgresql-test/scalardb-analytics-postgresql-custom-values.yaml
scalardbAnalyticsPostgreSQL:
databaseProperties: |
scalar.db.storage=multi-storage
scalar.db.multi_storage.storages=storage0,storage1
# Storage 0
scalar.db.multi_storage.storages.storage0.storage=jdbc
scalar.db.multi_storage.storages.storage0.contact_points=jdbc:mysql://mysql-scalardb.default.svc.cluster.local:3306/
scalar.db.multi_storage.storages.storage0.username=root
scalar.db.multi_storage.storages.storage0.password=mysql
# Storage 1
scalar.db.multi_storage.storages.storage1.storage=jdbc
scalar.db.multi_storage.storages.storage1.contact_points=jdbc:postgresql://postgresql-scalardb.default.svc.cluster.local:5432/postgres
scalar.db.multi_storage.storages.storage1.username=postgres
scalar.db.multi_storage.storages.storage1.password=postgres
scalar.db.multi_storage.namespace_mapping=customer:storage0,order:storage1
scalar.db.multi_storage.default_storage=storage1
schemaImporter:
namespaces:
- customer
- order
EOF -
Create a secret resource to set a superuser password for PostgreSQL.
kubectl create secret generic scalardb-analytics-postgresql-superuser-password --from-literal=superuser-password=scalardb-analytics
-
Deploy ScalarDB Analytics with PostgreSQL.
helm install scalardb-analytics-postgresql scalar-labs/scalardb-analytics-postgresql -n default -f ~/scalardb-analytics-postgresql-test/scalardb-analytics-postgresql-custom-values.yaml --version ${CHART_VERSION}
Step 7. Run an OLAP client pod
To run some queries via ScalarDB Analytics with PostgreSQL, run an OLAP client pod.
-
Start an OLAP client pod in the Kubernetes cluster.
kubectl run olap-client --image postgres:latest -- sleep inf
-
Check if the OLAP client pod is running.
kubectl get pod olap-client
You should see the following output:
$ kubectl get pod olap-client
NAME READY STATUS RESTARTS AGE
olap-client 1/1 Running 0 10s
Step 8. Run sample queries via ScalarDB Analytics with PostgreSQL
After running the OLAP client pod, you can run some queries via ScalarDB Analytics with PostgreSQL.
-
Run bash in the OLAP client pod.
kubectl exec -it olap-client -- bash
After this step, run each command in the OLAP client pod.
-
Run the psql command to access ScalarDB Analytics with PostgreSQL.
psql -h scalardb-analytics-postgresql -p 5432 -U postgres -d scalardb
The password is
scalardb-analytics
. -
Read sample data in the
customer.customers
table.SELECT * FROM customer.customers;
You should see the following output:
customer_id | name | credit_limit | credit_total
-------------+---------------+--------------+--------------
1 | Yamada Taro | 10000 | 10000
2 | Yamada Hanako | 10000 | 9500
3 | Suzuki Ichiro | 10000 | 8500
(3 rows) -
Read sample data in the
order.orders
table.SELECT * FROM "order".orders;
You should see the following output:
scalardb=# SELECT * FROM "order".orders;
customer_id | timestamp | order_id
-------------+---------------+--------------------------------------
1 | 1700124015601 | 5ae2a41b-990d-4a16-9700-39355e29adf8
1 | 1700124021273 | f3f23d93-3862-48be-8a57-8368b7c8689e
2 | 1700124028182 | 696a895a-8998-4c3b-b112-4d5763bfcfd8
2 | 1700124036158 | 9215d63a-a9a2-4471-a990-45897f091ca5
3 | 1700124043744 | 9be70cd4-4f93-4753-9d89-68e250b2ac51
3 | 1700124051162 | 4e8ce2d2-488c-40d6-aa52-d9ecabfc68a8
3 | 1700124058096 | 658b6682-2819-41f2-91ee-2802a1f02857
3 | 1700124071240 | 4e2f94f4-53ec-4570-af98-7c648d8ed80f
(8 rows) -
Read sample data in the
order.statements
table.SELECT * FROM "order".statements;
You should see the following output:
scalardb=# SELECT * FROM "order".statements;
order_id | item_id | count
--------------------------------------+---------+-------
5ae2a41b-990d-4a16-9700-39355e29adf8 | 2 | 2
5ae2a41b-990d-4a16-9700-39355e29adf8 | 1 | 3
f3f23d93-3862-48be-8a57-8368b7c8689e | 5 | 1
696a895a-8998-4c3b-b112-4d5763bfcfd8 | 4 | 1
696a895a-8998-4c3b-b112-4d5763bfcfd8 | 3 | 1
9215d63a-a9a2-4471-a990-45897f091ca5 | 2 | 1
9be70cd4-4f93-4753-9d89-68e250b2ac51 | 1 | 1
4e8ce2d2-488c-40d6-aa52-d9ecabfc68a8 | 2 | 1
658b6682-2819-41f2-91ee-2802a1f02857 | 3 | 1
4e2f94f4-53ec-4570-af98-7c648d8ed80f | 5 | 1
(10 rows) -
Read sample data in the
order.items
table.SELECT * FROM "order".items;
You should see the following output:
scalardb=# SELECT * FROM "order".items;
item_id | name | price
---------+--------+-------
5 | Melon | 3000
2 | Orange | 2000
4 | Mango | 5000
1 | Apple | 1000
3 | Grape | 2500
(5 rows) -
Run the
JOIN
query. For example, you can see the credit remaining information of each user as follows.SELECT * FROM (
SELECT c.name, c.credit_limit - c.credit_total AS remaining, array_agg(i.name) OVER (PARTITION BY c.name) AS items
FROM "order".orders o
JOIN customer.customers c ON o.customer_id = c.customer_id
JOIN "order".statements s ON o.order_id = s.order_id
JOIN "order".items i ON s.item_id = i.item_id
) AS remaining_info GROUP BY name, remaining, items;You should see the following output:
scalardb=# SELECT * FROM (
scalardb(# SELECT c.name, c.credit_limit - c.credit_total AS remaining, array_agg(i.name) OVER (PARTITION BY c.name) AS items
scalardb(# FROM "order".orders o
scalardb(# JOIN customer.customers c ON o.customer_id = c.customer_id
scalardb(# JOIN "order".statements s ON o.order_id = s.order_id
scalardb(# JOIN "order".items i ON s.item_id = i.item_id
scalardb(# ) AS remaining_info GROUP BY name, remaining, items;
name | remaining | items
---------------+-----------+----------------------------
Suzuki Ichiro | 1500 | {Grape,Orange,Apple,Melon}
Yamada Hanako | 500 | {Orange,Grape,Mango}
Yamada Taro | 0 | {Orange,Melon,Apple}
(3 rows) -
Exit from the psql command.
\q
-
Exit from the OLAP client pod.
exit
Step 9. Delete all resources
After completing the ScalarDB Analytics with PostgreSQL tests on the Kubernetes cluster, remove all resources.
-
Uninstall MySQL, PostgreSQL, and ScalarDB Analytics with PostgreSQL.
helm uninstall mysql-scalardb postgresql-scalardb scalardb-analytics-postgresql
-
Remove the client pods.
kubectl delete pod oltp-client olap-client --grace-period 0
-
Remove the secret resource.
kubectl delete secrets scalardb-analytics-postgresql-superuser-password
-
Remove the working directory and sample files.
cd ~
rm -rf ~/scalardb-analytics-postgresql-test/