发布日期:2015-11-05 14:45 来源: 标签: 数据库 oracle教程 oracle函数 oracle数据挖掘
本章我们主要学习oracle数据库实际应用中常用数据挖掘功能函数有哪些?下面我们就做一下具体讲解,希望大家多多支持中国站长网络学院。
注:数据挖掘功能的操作模式,已建成使用dbms_data_mining包或Oracle数据挖掘Java API。
1)CLUSTER_ID:返回预测对mining_attribute_clause指定预测集概率最高的簇群标识符
CLUSTER_ID(<schame.model> <mining_attribute_clause>) 
eg:
SELECT CLUSTER_ID(km_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt
FROM km_sh_sample_apply_prepared
GROUP BY CLUSTER_ID(km_sh_clus_sample USING *)
ORDER BY cnt DESC;  

2)CLUSTER_PROBABILITY:返回一个与指定模型相关联的一个输入行成员的隶属度的度量。
CLUSTER_PROBABILITY(<schema.model>, <cluster_id> <mining_attribute_clause>)  
eg:
SELECT *
FROM (
  SELECT cust_id, CLUSTER_PROBABILITY(km_sh_clus_sample, 2 USING *)
  prob
  FROM km_sh_sample_apply_prepared
  ORDER BY prob DESC)
WHERE ROWNUM < 11;  

3)CLUSTER_SET:返回一个动态数组对象包含所有可能的集群,一个给定的行属于。在动态数组的每个对象是一对标量值包含群集ID和集群的概率。对象域命名cluster_id和概率,都是Oracle数
CLUSTER_SET(<schema.model>, <top N>, <cutoff>
<mining_attribute_clause>)
eg:
WITH clus_tab AS (SELECT id, A.attribute_name aname,
A.conditional_operator op, NVL(A.attribute_str_value,
ROUND(DECODE(A.attribute_name, N.col,
A.attribute_num_value * N.scale + N.shift,
A.attribute_num_value),4)) val,
A.attribute_support support,
A.attribute_confidence confidence
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('km_sh_clus_sample')) T,
TABLE(T.rule.antecedent) A, km_sh_sample_norm N
WHERE A.attribute_name = N.col(+) AND A.attribute_confidence > 0.55),
clust AS (
  SELECT id, CAST(COLLECT(Cattr(aname, op, TO_CHAR(val), support,
  confidence)) AS Cattrs) cl_attrs
  FROM clus_tab
  GROUP BY id), custclus AS (
    SELECT T.cust_id, S.cluster_id, S.probability
    FROM (
      SELECT cust_id, CLUSTER_SET(km_sh_clus_sample, NULL, 0.2 USING *)
      pset
      FROM km_sh_sample_apply_prepared
      WHERE cust_id = 101362) T,
      TABLE(T.pset) S)
    SELECT A.probability prob, A.cluster_id cl_id, B.attr, B.op, B.val,
    B.supp, B.conf
    FROM custclus A, (
      SELECT T.id, C.*
      FROM clust T, TABLE(T.cl_attrs) C) B
      WHERE A.cluster_id = B.id
      ORDER BY prob DESC, cl_id ASC, conf DESC, attr ASC, val ASC;

4)FEATURE_ID:返回一个具有最高值系数值的特征的标识号
FEATURE_ID(<schame.model> <mining_attribute_clause>) 
eg:
SELECT FEATURE_ID(nmf_sh_sample USING *) AS feat, COUNT(*) AS cnt
FROM nmf_sh_sample_apply_prepared
GROUP BY FEATURE_ID(nmf_sh_sample USING *)
ORDER BY cnt DESC; 

5)FEATURE_SET:返回一个包含所有可能的功能对象的动态数组。在动态数组的每个对象是一对标量值包含特征标识和特征值。对象域的命名特征璤身份和价值,都是Oracle数。
FEATURE_SET(<schema.model>, <top N>, <cutoff>
<mining_attribute_clause>)
eg:
WITH
feat_tab AS (
SELECT F.feature_id fid,
       A.attribute_name attr,
       TO_CHAR(A.attribute_value) val,
       A.coefficient coeff
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF('nmf_sh_sample')) F,
       TABLE(F.attribute_set) A
 WHERE A.coefficient > 0.25), feat AS (
SELECT fid,
       CAST(COLLECT(Featattr(attr, val, coeff)) AS Featattrs) f_attrs
  FROM feat_tab
GROUP BY fid
),
cust_10_features AS (
SELECT T.cust_id, S.feature_id, S.value
  FROM (SELECT cust_id, FEATURE_SET(nmf_sh_sample, 10 USING *) pset
        FROM nmf_sh_sample_apply_prepared
        WHERE cust_id = 100002) T, TABLE(T.pset) S)
SELECT A.value, A.feature_id fid,
       B.attr, B.val, B.coeff
  FROM cust_10_features A,
       (SELECT T.fid, F.*
        FROM feat T, TABLE(T.f_attrs) F) B
 WHERE A.feature_id = B.fid
ORDER BY A.value DESC, A.feature_id ASC, coeff DESC, attr ASC, val ASC;

6)FEATURE_VALUE:返回给定的特征值。如果你忽略feature_id参数,则函数返回的最大特征值。你可以与feature_id函数结合使用这种形式来获得最大的特征值的组合。
FEATURE_VALUE(<schema.model>,<feature_id><mining_attribute_clause>) 
eg:
SELECT *
FROM (
  SELECT cust_id, FEATURE_VALUE(nmf_sh_sample,3 USING *) match_qual
  FROM nmf_sh_sample_apply_prepared
  ORDER BY match_quality DESC)
WHERE ROWNUM < 11; 

7)PREDICTION:返回模型的最佳预测。返回的数据类型取决于模型的建立过程中使用的目标值类型。对于回归模型,此函数返回期望值。
PREDICTION(<schema.model>, <cost_matrix_clause>
<mining_attribute_clause>) 
eg:
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
FROM mining_data_apply_v
WHERE PREDICTION(DT_SH_Clas_sample COST MODEL
USING cust_marital_status, education, household_size) = 1
GROUP BY cust_gender
ORDER BY cust_gender; 

8)PREDICTION_COST:返回一个给定的预测值作为一个给定的预测数
PREDICTION_COST(<schema.model>, <class> <cost_matrix_clause>
<mining_attribute_clause>) 
eg:
WITH cust_italy AS (
  SELECT cust_id
  FROM mining_data_apply_v
  WHERE country_name = 'Italy'
  ORDER BY PREDICTION_COST(DT_SH_Clas_sample,1 COST MODEL USING *) ASC,1)
SELECT cust_id
FROM cust_italy
WHERE rownum < 11; 

9)PREDICTION_DETAILS:返回一个包含模型相关信息的字符串,该字符串包含输入行的评分
PREDICTION_DETAILS(<schema.model> <mining_attribute_clause>) 
eg:
SELECT cust_id, education,
PREDICTION_DETAILS(DT_SH_Clas_sample using *) treenode
FROM mining_data_apply_v
WHERE occupation = 'TechSup' AND age < 25
ORDER BY cust_id;  

10)PREDICTION_PROBABILITY:返回给定预测的概率作为一个甲骨文号
PREDICTION_PROBABILITY(<schema.model> <class>
<mining_attribute_clause>) 
eg:
SELECT cust_id
FROM (
  SELECT cust_id
  FROM mining_data_apply_v
  WHERE country_name = 'Italy'
  ORDER BY PREDICTION_PROBABILITY(DT_SH_Clas_sample, 1 USING *)
  DESC, cust_id)
WHERE rownum < 11; 

11)PREDICTION_SET:返回一个动态数组在多类分类方案包含所有类的对象
PREDICTION_SET(<schema.model>, <best N>, <cutoff>
<cost_matrix_clause> <mining_attribute_clause>) 
eg:
SELECT T.cust_id, S.prediction, S.probability, S.cost
FROM (
  SELECT cust_id,
 PREDICTION_SET(dt_sh_clas_sample COST MODEL USING *) pset
  FROM mining_data_apply_v
  WHERE cust_id < 100011) T,
TABLE(T.pset) S
ORDER BY cust_id, S.prediction; 



相关评论

专题信息
    ORACLE数据库系统是美国ORACLE公司(甲骨文)提供的以分布式数据库为核心的一组软件产品,是目前最流行的客户/服务器(CLIENT/SERVER)或B/S体系结构的数据库之一。比如SilverStream就是基于数据库的一种中间件。ORACLE数据库是目前世界上使用最为广泛的数据库管理系统,作为一个通用的数据库系统,它具有完整的数据管理功能;本教程主要是针对oracle应用过程中我们常用的函数进行深入讲解,让大家能够通过代码实例更快、更有效的掌握oracle函数,希望对大家有所帮助。