admin 管理员组文章数量: 887021
2024年2月29日发(作者:制表格初入门)
DataMinKnowlDiscDOI10.1007/s10618-014-0360-3DetectinganomalycollectionsusingextremefeatureranksHanboDai·FeidaZhu·Ee-PengLim·HweeHwaPangReceived:24February2013/Accepted:4June2014©TheAuthor(s)2014AbstractDetectinganomalycollectionsisanimportanttaskwithmanyapplications,omalycollection,entitieult,,membersofananomalycollectionareeoduceanovelanomalydefioseanewmeasureofanomaloecanbealargenumberofERACsofvarioussizes,forsimplicity,wefirstinvestigatetheERACdetectionproblemoffindingtop-KERACsofapredefitacklethefollow-upERACexpansionproblemofuncoveringthesupersetthmsareproposedforbothERACdetectionandexpansionproblems,fically,insyntheticdatasets,bothERACdetsibleeditor:(B)TheSchoolofComputerScienceandInformationEngineering,HubeiUniversity,Wuhan,Chinae-mail:daihanbo@··eSchoolofInformationSystems,SingaporeManagementUniversity,Singapore,Singaporee-mail:fdzhu@-mail:eplim@-mail:hhpang@123
spamdataset,bothERACdetectionandexpansionalgorithDBdataset,bothERACdetectionandexpansionalgorithmsidentifyunusualactorcollectionsthatarenoteasilyidentifineseonlineforumdataset,ourERACdetectionalgorithmidentifiessuspicious“waterarmy”pansionalgorithmsuccessfullyrdsAnomalycollection·Extremefeaturerank·Anomalycluster·Outliergroup·Spamdetection·Spamcluster1IntroductionAccordingtoBarnettandLewis(1994),ananomalyoroutlierisadatainstanceorsubsetofdatainstaral,ananomalycanbeclassifianomalyusuallyliesinasparseregionorisfarawayfromnormalones,whereasananomalycollectionareformedbysimilarentities,tice,thisinconsistencyoftenimpliesdifferentagendaspaper,wedetectanomalycollectionsbytheirextremefeatureranks,basedontheobservationthatmembersinananomalycollrtedbyFetterlyetal.(2004),Castilloetal.(2007)andGyöngyietal.(2004),mple,theystuffthepagsogeneratepagesfromsimilartemplatesontheflyinordertoperform“linkspam”.Asaresult,whenmeasuredbythosecharacteristicsorfeatures,spammerhostsconsistentlydemonstrateveryextremetraitsandformanidentifiableanomalouscollection,strate,Fig.1shows30webhosts{e0,...,e29}withthreehostfeatures{f0,f1,f2},reflectingtheaforementionedspammingstrategies:f0representstheaveragenumberofpopularkeywords,f1isthevarianceofthewordcount,hfeature,henidentify{e5,e7,e12}6e29e20e21Fig.1AnexampleofERAC.30entities{e0,...,e29}arerankedaccordingtoeach3features{f0,f1,f2}.Inthisexample,{e5,e7,e12}isanERAC123
Detectinganomalycollectionsf0andf2,tthate5,e7ande12collectivelydisplayextremetraitsacrossthherexample,groupsoffraudulentulentuserwouldcreatesufficientlow-pricetransactionswithotheraccompliceaccountsinashorttimeinordertogaincredibility,beforeperformingfraudtransactionsinvolvinglargesumsofmoneyaccordingtoChuaandWareham(2004)andPanditetal.(2007).Consequently,theyarelikelytorankatextremepositionswithrespeerstudythiskindofanomalycollections,weproposeanoveldefinition,isanentitysubsetclusteredtowardthetoporbottomranks,annotbeeasilydetectedbyexistinganomalydetectionapproaches,becausetheyeitherfocusonsinglepointanomalies,ortatasetofsinglepointanomaliesdoesnotalwaysformanERAC,mple,inFig.1,e12isnotveryextremebyitselfalthoughitispartofanERAC{e5,e7,e12}.Incontrast,e8isveryanomalousasasingleentity,sinceitappearsatextremepositionsonallthreefeatures,hereforecannotbediscoverctERACs,Daietal.(2012)malousnessofanERACisquantifihelargenumberofERACsofvarioussizes,Daietal.(2012)tackletheERACdetectionproblemofdiscoveringtop-KERACswithapredefinedsizelimit,whichissettosmallvaluesforeffiheless,afterbeingofferedwiththetopERACsofapredefinedsize,usersmaywanttomple,inthewebspamcase,usersmayfindthedetectedERAC{e5,e7,e12}ofinterestastheyhavethecom-monspammingstrategyofusinglotsofpopularkeywords,withverylittlevarianceonthewordcount,turaltoask,canwedetectthesupersetofthisERACthatareevenmoreanomalouswithsimilarsetsofspammingstrategies?Therefore,inthispaper,wenotonlyexploretheERACdetectionproblem,butalsoproposetheERACexpansionproblemtouncoverthesupersettheERACdetectionproblem,ERACexpansionisdonewithoutpredefiarizeourcontributionsasfollows:–Wearethefiuretheanomalousnessofacollectionbyhowextremelyrankeditiswithrespecttoanyfeatureset.–Wedevelopbothexactandheuristicalgorithmstofindthetop-Kanomalouscol-lectionsofapredefinedsizelimitondifferentpruningstrategiesunderthefeature123
l.––––providealgorithosethefollow-upproblemdesignefficientgreedyalgoritoseanexploratoryschemeforsearchingERACs,makinguseoyourERACdetectionapproachonsyntheticdatasetswithinjectedERACsandonthreerealdatasetsincludingawebhostgraph,heticdatasets,ourproposedheuristicalgorithmscaleswellwiwebspamdatasetwithlabeledtruespammers,ourapproachdiscoversspammercollectionsthataremoreanomalouswhileachievinghigherprecisions,comoviedataset,wedetectunusuaaluationshowsourapproachsuccessfullyfiultsdemon-stratethatinthesyntheticdatasets,theinjectedERACsareretrievedwithhighsuccessrate;inwebhostdataset,thealgorithmachieveshigherprecisionthanexistingmethods;inthemoviedataset,theexpansionrevealslargeranomalousactorcollectionsthatcannotbediscoveredbytheclustering-basedapproach;intheChineseonlineforumdataset,theexpantroducingtheproblemformulationinSects.3and4presentsourERy,Sect.8concludesthepaperanddiscusseslimitationsandfeaturework.2RelatedworkAccordingtoasurveybyChandolaetal.(2009),ulofapproachesareproposedsuchasaclassification-basedonebyCastilloetal.(2007),adistance-basedonebyKnorrandNg(1998),adensity-basedonebyBreunigetal.(2000)andclusteringbasedonesbyEsteretal.(1996)andGuhaetal.(1999).Sincetheseapproachesassumethatanomaliesappearinsparseregionsorarefarawayfromthenormalentities,anomalouscollectal.(2009),Heetal.(2003)andLoureiroetal.(2004)useclusteringbasedapproachforanomalouscollectiondetection,assumingthatnormalentitiesbelongtolargeanddenseclusters,ingtoDuanetal.(2009)andHeetal.(2003),anomalousclustersarethesmalleronesthat123
Detectinganomalycollectionstogetherconstitutelessthan10%roetal.(2004)r,theassumptionthrmore,l.(2010)kassumesthatafterdatapointsareprojectedtosomehyperplane,theanomalouspointsfollowdistributr,theseassumptionsdonotalwayshold,-Castroetal.(2011)useastatisticalmodeltodetectinagraphananomalycollectioninwhicrktakestheassumptionthatonlyoneanomalousclusterexistsinthewholegraph,inganomalouscollectionsisalsorelatedtothetaskofsubgroupdiscoveryproposedbyKlösgen(1996)andWrobel(1997).AsurveydonebyHerreraetal.(2011)summarizesthetaskastodiscoverthesubgroupsoftheentitypopulationthatarestatistically“mostinteresting”groupsareinducedbyrules,r,theanobecause,(i)notallmembersofananomalycollectionsatisfyrules;(ii)er,theanomalousnessweusetomeasureananomalycollectiondoesnotinvolvetheclasslabels,whereastheinterestingnessmeasureusedforsubgraphdiscoveryusuallydoes.3ExtremerankanomalouscollectionDaietal.(2012)haveshownthattheanomalousnessofacollectionisbettermeasureddirectlyatcollectionlevel,insteadofmeasuringindivadoptthatdefinotetheuniversalentityset,(e).1,rank(e8)=1andrank(e7)=emityindexrreferstoanextremeregion,andSf(r)ef,Table1NotationsNotationESRankf(e)pf(S,r)rf(S)presentativerofSonfNotationFrSf(r)
版权声明:本文标题:Detecting anomaly collections using extreme feature ranks-DMKD 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.freenas.com.cn/jishu/1709210521h540400.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
更多相关文章
微信小程序入驻腾讯位置服务平台入门介绍及使用限制
入驻腾讯位置服务平台 登陆腾讯位置服务官网 lbs.qq,支持QQ、微信、手机号注册账号。 第一步,点击官网右上角“登陆”按钮。 第二步,点击后页面有弹框࿱
Win10中PyCharm2020.1.4安装使用入门(修订版)
Win10中PyCharm2020.1.4安装使用入门(修订版) PyCharm是JetBrains 打造的一款Python IDE(集成开发环境,I
excel表格字太多不完全显示怎么办
excel设置过自动换行后的单元格文字都显示了单元格内,但是很拥挤,还是显示不全。 如图: 解决步骤: 1、选中需要显示不全的列࿰
Word中如何删除表格下一页的空白页
Reference: [1] Word空白页怎么都删除不掉?用这6个方法随便删! - 知乎 (zhihu)
wps有表格的word文档多一页空白删不掉?
首先将段落标记显示出来,如下: 然后选中多出来的空白页中的换行符(双击即可) 然后鼠标右键–>段落–》行距选择固定值–》值选择最小的即可&
Vue2.0+Vue3.0从入门到精通(尚硅谷学习笔记)--Vue2.0
一、Vue简介 1、Vue是什么? 一套用于构件用户界面的渐进式JavaScript框架。 Vue可以自底向上逐层的应用 简单应用:只需要一个轻量小巧的核心库复杂应用:可
3万字干货HTML+CSS入门指南(建议收藏)
什么是浏览器 浏览器是安装在电脑里面的一个软件, 能够将网页内容呈现给用户查看,并让用户与网页交互的一种软件。 就好比QQ一样都是安装在电脑里面的一个软件, 只不过功能不同而已 常见主流浏览器浏览器市场份额-国内统计浏览器市场份额-国外
Debian 入门安装与配置1
Debian 入门安装与配置1 最近安装了多个发行版本的Linux,包括Ubuntu、Fedora、Centos和Debian,发现只有Debian在界面和稳定性等综合特性上表现最优&#x
Linux系统学习入门(1)(转载)
作为一名程序员,由于系统服务器大多选择使用Linux系统,所以学习Linux是成为合格程序员的一个入门标准! 一、下载centos 7 livecd iso 访问镜像网站&
Linux入门(兄弟连)
目录 第一讲 Linux系统简介 一、UNIX与Linux发展史 二、开源软件简介 三、Linux应用领域 四、Linux学习方法 第二讲 Linux系统安装 一、VMware虚拟机的安装与使用 二、系统分区 三、Linu
Windows7下Jupyter Notebook使用入门
目录 一、Jupyter简介 二、Jupyter安装2.1 python 3安装2.2 Jupyter 安装 三、Jupyter使用示例 四、Jupyter常用命令 五、其他说明 一、Jupyter简介
第二回 史大郎夜走华阴县 鲁提辖拳打镇关西- Linux新手快速入门
却说九纹龙史进跟朱武等三人约好了中秋赏月,却不料这天突然失火,房子都被一把火烧光。于是史进只好夜里跟着朱武等人一起离开华阴县。朱武劝他到少华山一起干,史进说ÿ
虚拟机入门导入Linux系统
目录 下载安装 检测是否安装完成 虚拟机基本配置 #网段配置 #导入Linux系统 第一次使用虚拟机报错是虚拟机禁用问题 虚拟机网络说明 #桥接模式 #NAT模式 下载安装 官网下载地址: https:
Windows安全加固总结(非常详细)零基础入门到精通,收藏这一篇就够了_windows系统安全加固操作系统安全实践报告
为了达到安全的目的,一般来说我们需要关注操作系统的八个方面: 补丁管理 > 账号漏洞 > 授权管理 > 服务管理 > 功能优化 > 文件管理 > 远程访问控制
word vba遍历每一页的第一个表格对象_WORD|操作题第10套
Word | 操 作 题 第 10 套 题 目 某出版社的编辑小刘手中有一篇有关财务软件应用的书稿“Word素材.docx”,请按下列要求完成书稿编排工作。 1.在考生文件夹下,将“Word素材.docx”文件另存为“Word
word表格跨页后自动生成的顶部横线【去除方法】
Hello World! Its been a long time. 这一年重心放在了科研、做事、追寻新的经历上,事有正事、琐事、幸事、哀事,内心与认知成长了一些,思想成
Word表格跨页自动显示表头 自动添加标题
http:jingyan.baiduarticleeae07827b8f0a71fec5485f9.html 接下来在表头所在的行点击鼠标右键,然后选择“表格属性”选项 打开表格属性窗口后,
贝叶斯软件genle教程_Bayes | 贝叶斯统计入门杂记
这篇文章首先以一个最简单的例子+code,带你体验什么是MCMC,随后会推荐一些实用资源。[不知怎么的,这篇文章本来要写R+JAGS教程,最后硬是写成了一篇杂文?,不过也还算是一篇过得去的低阶入门文章吧。] 贝叶斯模型的后验分布可以使用共轭
Win11 删除“入门”和“Windows备份”以及 Win10 删除“Windows备份”的方法
说明:此方法适用于已经安装好的Windows。 Win11 删除"入门"和"Windows备份": 1.将在C:WindowsSystemAppsMicrosoftWi
发表评论