首页技术总结正文内容

实验分析

技术总结

更新时间：2024-12-22 22:28:57 52

admin 管理员组

文章数量: 887019

2024年3月12日发(作者：青少年机器人编程培训机构排名)

5.1 实验配置

实验环境配置为：Intel(R) Core(TM) i7-3370 CPU 3.40GHz，16G，64 位。

算法用C++实现，编译器为VS2010。操作系统为win7。Fp-Growth算法的代码

来自/article/195770，同样用C++实现。实验使用了5

个真实数据集，分别是retail,T10I4D100K,T40I10D100K,pumsb,kosarak,数据集来

自/data/，其具体参数见表2.

All the experiments are performed on a 3.40 GHz Intel(R)Core(TM) i7-3370 PC

machine with 16 G main memory, running on Microsoft Windows 7 the

programs are written in Microsoft/Visual studio implementation of

Fp-growth was download from /article/195770. we used

three real datasets and two synthetic datasets to download from

/data/ in our experiments. These datasets were often used in

previous study of frequent itemsets mining. The detailed descriptions of all datasets

are showed as follows.

表2 数据集参数

Database #Items #Trans

(a)T10I4D100K 10 1000 100000

(b)T40I10D100K 40 1000 100000

(c)pumsb 74 7117 49046

(d)retail 10 16470 88162

(e)kosarak

表中表示事务T的平均长度，#Items表示项的编号个数，#Trans表示

数据库DB中事务的总数。其中(a)(b)(d)(e)是稀疏数据，而pumsb是密集型数据。

Table 2 shows the characteristics of the real and synthetic datasets used in our

experiments,which show the average transaction length (denoted by ),the

number of items(denoted by #Items) and the number of transactions(denoted by

#Trans) in each dataset (a)(b)(d)(e) is sparse, while the dataset (c) is

dense.

图 5 total time and run time of (a)T10I4D100K,(b)T40I10D100K,(c)pumsb,(d)retail,

(e)kosarak.

从图5中我们可以看出，对于稀疏数据库来说，Fp-Search的时间复杂度明显优

于Fp-growth，如(a)(b)(d)(e),原因很简单，那就是对于稀疏数据库而言，Fp-tree

中有过多的非频繁模式，对应于Np_Graph和MP-tree中就有较多的频非频繁路

径，而Fp-growth没有能够较好地对这些非频繁模式带来的冗余计算进行剪枝，

而Fp-Search却能够很好的进行剪枝。从(c)可以看出，当Fp-tree比较密集时，

Fp-tree中几乎所有的模式都是频繁的，需要剪枝的模式很少很少，所有模式的

遍历都是必须的，那么此时Fp-Search的剪枝带来的额外时间消耗和剪枝所带来

的性能优化相当，此时两个算法的时间几乎完全一样，随着支持度减小，频繁集

的个数增加，同时Fp-tree的密集程度也不断增加，导致剪枝和构建MP-tree所

造成的时间消耗超过剪枝带来的性能优化，此时Fp-Search的性能略逊于

Fp-growth，多出的时间为构造MP-tree的时间，复杂度为树中节点的个数，因为

扩展时插入的时间复杂度为

O(1)

这个代价是极小的。

Figure 5 shows that when the dataset is sparse like (a)(b)(d)(e),the Fp-Search’s

performance will be significantly higher than is because when the

dataset is sparse, Fp-tree contains many patterns which are not frequent,and it lead to

a lot of redundant computation to construct conditional pattern base. In this case,the

corresponding Np_Graph and MP-tree also contain a lot of paths which are not

frequent. However,Fp-growth can’t prune those redundant computation, while

MP-tree has good pruning performance. From (c) this, we can see that when the

Fp-tree is intensive, almost all pattern in Fp-tree are frequent,there is a little redundant

path need to be pruned,So all traverse is necessary,thus the performance optimization

of pruning operation is not obvious,and it also need some time to prune,So it’s

running is almost equal to Fp-growth’s running time.

下面的这个实验表明了算法的线性程度，我们把tid-code的求交运算作为原

子操作，那么把所有tid-code的求交运算分为两类，一类是Necessary：求交后

判断为频繁路径时的运算；一类是Non-essential：求交后判断为非频繁路径的时

的运算。下面给出了retail和pumsb的实验结果：

图 6 算法冗余度的比较

根据图6，不难发现，对于密集型数据(c),冗余计算是极小的，这从一个侧面

反映了Fp-tree是密集型的。而对于稀疏数据(d)来讲，算法的冗余计算和必要的

计算几乎完全一样，基于篇幅，实际笔者做了大量实验发



=Necessary/Non-essential是极小的，一般维持在（0,4）之间，这说明完全

Np_Graph剪枝策略的性能很好。

6 总结

本文提出了一个新的频繁项集挖掘算法Fp-Search，它利用Np_Graph实现路

径扩展，Tid-tree实现tid-set压缩，MP-tree实现路径剪枝来高效的挖掘所有频繁

项集。它主要有以下几个优点：（1）Np_Graph能够快速的进行路径扩展；（2）

Tid-tree大大压缩tid-set；（3）MP-tree能实现高效的剪枝。实验表明算法有着很

好的性能，但仍然存在不足之处，即构造MP-tree和遍历MP-tree需要消耗一定

的时间，尽管向MP-tree中插入一个节点的时间复杂度为

O(1)

，但是当频繁集的

个数非常大时，这个消耗会成为算法的瓶颈，下一步的工作就是如何寻找一种更

好的剪枝方案来提高算法的效率。

In this paper，we propose a novel algorithm, Fp-Search, which is based on

Np_Graph for the fast path extension, Tid-tree for compressing the tid-set,and

MP-tree for pruning the advantages are shown as follows:(1)Np_Graph can

efficiently extend the path;(2)Tid-tree can greatly compress the tid-set;(3)MP-tree can

implement efficient experimental results show that the Fp-Search

algorithm has a good ,every coin has two sides,it still has

shortcomings that when constructing the MP-tree and traversing the MP-tree,some

extra time is needed,althou-

gh add a node to MP-tree need time

O(1)

,but when the number of frequent

itemsets is very large,the consumption will become the bottleneck of

finding a more efficient pruning method will be the main work in the future study.

本文标签：剪枝算法实验

版权声明：本文标题：实验分析内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.freenas.com.cn/jishu/1710229144h564972.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

技术交流 – FreeNAS中文网

实验分析

更多相关文章

操作系统 | 银行家算法举例

【操作系统】银行家算法

Java实现操作系统银行家算法模拟程序+GUI图形化

操作系统-银行家算法上机实验

【操作系统】银行家算法：算法分析 + 运行结果 + 完整代码

操作系统——银行家算法 python实现

计算机操作系统-银行家算法习题

【计算机操作系统】银行家算法的模拟实现

【操作系统】避免死锁——银行家算法

避免死锁之银行家算法

【银行家算法】超清晰代码

操作系统实习：银行家算法（C++面向对象的思想实现）

银行家算法

【操作系统】银行家算法，当其中一个进程发出请求时，需要判断系统是否能够将资源分配给它

银行家算法的实验报告

数据挖掘十大经典算法 整理

MATLAB算法实战应用案例精讲-【自动驾驶】自动驾驶中的自动泊车功能

改变世界的十大算法

科学界十大伟大算法

php 辗转相除法,算法,

发表评论

推荐文章

Android处理android.app.RemoteServiceException: Bad notification for startForeground: java.lang.Ru

Oozie原理篇

日用化妆品批发企业订单管理系统解决方案

DOS命令下硬盘坏道检测和修复工具

VMware虚拟机Windows 10安装使用教程（非常详细）

热门文章

windows系统一键恢复初始状态

Windows如何设置Tomcat开机启动？

网络基础知识笔记

Android路框架

windows下搭建sftp服务端

ChatGPT.nvim: 在Neovim中集成OpenAI强大的ChatGPT功能

windowsphp7+nginx1.10 环境搭建

windows下python3.13.0的下载与安装教程

ChatGPT国内中文版镜像网站整理合集（20241208）更新

Windows中BeautifulSoup的安装方法

最新文章

Raid技术

LSI_阵列卡操作手册

破解Centos7_root用户密码

Redhat重置Root用户密码方法

远程批量修改linux服务器密码的脚本

最新Windows 11教育版下载：专为教育设计的系统！

Win7系统下搭建NFS服务器

零基础使用UltraISO制作并安装纯净Win10系统指南

苹果电脑windows系统换苹果系统

Win11系统崩溃错误修复指南：三种实用方法详解

数据挖掘十大经典算法整理