2017-09-06

Cumulated card transaction ratio monitoring by BigObject Tree Analysis

by Mark Hsia
2017/9/6

BigObject-Performance with 100x Speedup


A tree object has a hierarchical structure that offers a more efficient way of computation than using a table object in some data analytics situations. 
BigObject provides a transformation mechanism called trans-join (transformative join) to construct a tree object from a table object.

In this example, we are trying to monitor personal credit card transactions on different shops to check whether any person who used many cards on different shops has exceeded a certain ratio threshold.

Assume each person has many cards, so there is a table t1 which keeps card data with owner’s id:
"CREATE TABLE t1 ('person' string(16), 'card' string(16),key(card) )"

Another table t2 keeps transaction data including card numbers, shop names , product names and transaction amounts:
CREATE TABLE t2 ('card' string(16),'shop' string(16),'product' string(16),'amt' double)”

First, we consolidate all cards into personal id by the command:
CREATE TABLE t3 as (select shop,product,person,sum(amt) as amt from (select t2.shop as shop,t2.product as product,t1.person as person,t2.amt as amt from t2 left join t1 on t1.card=t2.card) group by shop, product, person)”

And then create a tree tree1 from the result:
CREATE TREE tree1 (v1 DOUBLE, SUM(amt)) FROM t3 GROUP BY shop,product,person”

Then we can extract the result from different layers of the tree:
"CREATE TABLE t3 as (select shop,SUM(amt) as a from (GET '/*',  SUM(amt) FROM tree1))  key(shop)"

"CREATE TABLE t4 as (select shop,product,SUM(amt) as a from (GET '/*/*',  SUM(amt) FROM tree1))  key(shop,product)"

"CREATE TABLE t5 as (select shop,product,person,SUM(amt) as a from (GET '/*/*/*',  SUM(amt) FROM tree1))"

Up to this stage, we are ready to monitor the data ratio of different layers  by running the following ratio monitoring commands continuously:

"SELECT shop,product,person,t3.a as a3,t5.a as a5,t5.a/t3.a as ratio from t3 left join t5 on shop=t5.shop where t5.a/t3.a>=0.03 order by ratio desc,shop,product,person"

"SELECT shop,product,person,t4.a as a4,t5.a as a5,t5.a/t4.a as ratio from t4 left join t5 on shop=t5.shop  and product=t5.product where t5.a/t4.a>=0.03 order by ratio desc,shop,product,person"



Source Code:

import datetime
import time
import mysql.connector
import random
import pickle

cnx = mysql.connector.connect(user='scott', password='tiger',host='192.168.1.163')
cursor = cnx.cursor(buffered=True)

cursor.execute( "CREATE TABLE t1 ('person' STRING(16), 'card' STRING(16),key(card) )" )
cursor.execute( "CREATE TABLE t2 ('card' STRING(16),'shop' STRING(16),'product' STRING(16),'amt' double)" )

cnt=0
jmax=1000
imax=10000

persons=[]
add_sql=''
for j in range(jmax):
    add_sql = "INSERT INTO t1 (person, card) VALUES "
    for i in range(imax):
       cnt=cnt+1
       add_sql += "('person"+str(cnt%100)+"','card"+str(cnt%1000)+"'),"
       persons.append(cnt)
    cursor.execute(add_sql)

add_sql=''
cnt=0
jmax=1000
imax=10000
print 't2:'
for j in range(jmax):
    add_sql = "INSERT INTO t2 (card,shop,product,amt) VALUES "
    for i in range(imax):
       cnt=cnt+1
       add_sql += "('card" +str(cnt%1000) +"','shop" +str(cnt%100) +"','" +str(1001+cnt%1000) + "A'," + str(cnt%113)+"),"
       persons.append(cnt)
    cursor.execute(add_sql)

"""
imax=16
add_sql = "INSERT INTO t2 (card,shop,product,amt) VALUES "
add_sql = add_sql +"('card0','shop0','1101A',100),"
add_sql = add_sql +"('card0','shop0','1102A',200),"
add_sql = add_sql +"('card0','shop1','1101A',300),"
add_sql = add_sql +"('card0','shop1','1102A',400),"
add_sql = add_sql +"('card1','shop0','1101A',500),"
add_sql = add_sql +"('card1','shop0','1102A',600),"
add_sql = add_sql +"('card1','shop1','1101A',700),"
add_sql = add_sql +"('card1','shop1','1102A',800),"
add_sql = add_sql +"('card0','shop0','1101A',110),"
add_sql = add_sql +"('card0','shop0','1102A',210),"
add_sql = add_sql +"('card0','shop1','1101A',310),"
add_sql = add_sql +"('card0','shop1','1102A',410),"
add_sql = add_sql +"('card1','shop0','1101A',510),"
add_sql = add_sql +"('card1','shop0','1102A',610),"
add_sql = add_sql +"('card1','shop1','1101A',710),"
add_sql = add_sql +"('card1','shop1','1102A',810),"

cursor.execute(add_sql)
"""

sql = "CREATE TABLE t3 as (select shop,product,person,sum(amt) as amt from (select t2.shop as shop,t2.product as product,t1.person as person,t2.amt as amt from t2 left join t1 on t1.card=t2.card) group by shop,product,person)"
cursor.execute(sql)

# create tree
sql = "CREATE TREE tree1 (v1 DOUBLE, SUM(amt)) FROM t3 GROUP BY shop,product,person"
cursor.execute(sql)

sql="CREATE TABLE t3 as (select shop,SUM(amt) as a from (GET '/*',  SUM(amt) FROM tree1))  key(shop)"
cursor.execute(sql)

sql="CREATE TABLE t4 as (select shop,product,SUM(amt) as a from (GET '/*/*',  SUM(amt) FROM tree1))  key(shop,product)"
cursor.execute(sql)

sql="CREATE TABLE t5 as (select shop,product,person,SUM(amt) as a from (GET '/*/*/*',  SUM(amt) FROM tree1))"
cursor.execute(sql)

sql = "select shop,product,person,t3.a as a3,t5.a as a5,t5.a/t3.a as ratio from t3 left join t5 on shop=t5.shop where t5.a/t3.a>=0.0001 order by ratio desc,shop,product,person"
cursor.execute(sql)

sql = "select shop,product,person,t4.a as a4,t5.a as a5,t5.a/t4.a as ratio from t4 left join t5 on shop=t5.shop  and product=t5.product where t5.a/t4.a>=0.0001 order by ratio desc,shop,product,person"
cursor.execute(sql)

# Make sure data is committed to the database
cnx.commit()

cursor.close()
cnx.close()




No comments:

Post a Comment