Testing Scalability of 'Assets' and 'Issuances' Tables

I’ve simulated inserting rows into the assets and issuances tables and measured the performance.

The initial DB is a real, recent Counterparty DB.
The added rows are random values of the same format as real values.

Assets
Milliseconds to look up a name:
image

The performance is worse than O(n) :scream:

Issuances
Counting the number of rows for given names.

image

Even worse…


Python scripts used for testing.

Assets:

db_file = 'db_copy.db' #latest Counterparty DB

import os
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)

import sqlite3
con = sqlite3.connect(db_file)
cur = con.cursor()

import random
import datetime

assets = ['FEELGOODMAN', 'JPJA', 'WOLVERINE', 'LTBCOIN']
 
for row in cur.execute('SELECT COUNT(*) FROM assets;'): 
  row_count = row[0]

i = 0
while (True):
  asset_id = str(random.randint(1e14, 9e14))
  asset = 'A' + asset_id
  block = str(random.randint(1e6, 9e6))
  
  cur.execute('INSERT INTO assets VALUES("'+asset_id+'", "'+asset+'", "'+block+'", NULL);')
  if i == 0 or i == 1e5 or i == 1e6:
    print('Table rows: ' + f'{row_count+i:,}' + '   (' + f'{i:,}' + ' rows added)')
    for asset in assets:
      t0 = datetime.datetime.now()
      for row in cur.execute('SELECT * FROM assets WHERE asset_name LIKE "'+asset+'";'):
        t1 = datetime.datetime.now()
        delta = t1- t0
        print(row[1] + '  ' + str(int(delta.total_seconds() * 1000)) + 'ms')
    print()
    
  if i == 1e6:  
    break
  i += 1

con.close()

Issuances

db_file = 'db_copy.db' #latest Counterparty DB

import os
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)

import sqlite3
con = sqlite3.connect(db_file)
cur = con.cursor()

import random
import datetime

assets = ['FEELGOODMAN', 'JPJA', 'WOLVERINE', 'LTBCOIN']
 
for row in cur.execute('SELECT COUNT(*) FROM issuances;'): 
  row_count = row[0]

i = 0
while (True):
  #Random dummy values
  tx_index = str(1e7+i)
  tx_hash = str(random.randint(1e64, 9e64))
  block = str(random.randint(1e6, 9e6))
  asset_id = str(random.randint(1e14, 9e14))
  asset = 'A' + asset_id
  source = str(random.randint(1e34, 9e34))
  descr = str(random.randint(1e15, 9e15))
  
  cur.execute('INSERT INTO issuances VALUES("'+tx_index+'", "'+tx_hash+'", "0", "'+block+'", "'+asset+'", "500", "0", "'+source+'", "'+source+'", "0", "0", "0", "0.0", "'+descr+'", "1000000", "0", "valid", NULL, "0");')
  if i == 0 or i == 1e5 or i == 1e6:
    print('Table rows: ' + f'{row_count+i:,}' + '   (' + f'{i:,}' + ' rows added)')
    for asset in assets:
      t0 = datetime.datetime.now()
      for row in cur.execute('SELECT COUNT(*) FROM issuances WHERE asset LIKE "'+asset+'";'):
        t1 = datetime.datetime.now()
        delta = t1- t0
        print(asset + ' ' + str(row[0]) + '  ' + str(int(delta.total_seconds() * 1000)) + 'ms')
    print()
    
  if i == 1e6:  
    break
  i += 1

con.close()

Hmmm, discovered an embarrassing mistake BUT THIS IS GOOD NEWS.

I used the LIKE operator intstead of =.

The findings after this change are better.

The time to look up an asset in the assets table is negligible – even after 1M records added
SELECT * FROM assets WHERE asset_name = 'X'
image

On the issuances table` the increase is dramatic. Increasing the rows from <200k to 1.2M (6x) makes the query ~13x slower (from 15ms to 190ms). Worse than O(n) performance!

SELECT COUNT(*) FROM issuances WHERE asset = 'X'
image

Of course, there may be more that I’ve overlooked. Other, more relevant, queries for example?

Thank you for taking the time to write these tools and do this quick analysis.

Performance can be further improved by :

  • Indexing every unique address, asset, and tx_hash into separate index tables
  • Updating all CP tables to reference address/asset/x_hash by index id (integer)
  • Creating indexes on all CP tables for address/asset/tx_hash id fields
  • Migrating from sqlite to Mysql/MariaDB
  • Migrating tables to MyISAM (faster reads)

This is what I have done on counterparty2mysql/XChain, and get much better performance out of the database than the native sqlite cp database

IMO the database growth and performance is a valid concern that needs monitoring and tweaking going forward.