python REfO模块使用入门-FreeNAS中文网

admin 管理员组

文章数量: 887016

python REfO模块使用入门

在国内基本找不到关于REfO的使用的博客，转载一篇外网的使用实例
链接: Regular Expressions for Objects.
For work I recently needed to do something that is very similar to regexes, but with a twist: it should operate on lists of objects, not only on strings. Luckily, Python came to the rescue with REfO, a library for doing just this.

My usecase was selecting phrases from Part-of-Speech (POS) annotated text. The text was lemmatized and tagged using SpaCy and it resulted in lists of the following form:

s = [['i', 'PRON'], ['look', 'VERB'], ['around', 'ADP'], ['me', 'PRON'], ['and', 'CCONJ'], ['see', 'VERB'], ['that', 'ADP'], ['everyone', 'NOUN'], ['be', 'VERB'], ['run', 'VERB'], ['around', 'ADV'], ['in', 'ADP'], ['a', 'DET'], ['hurry', 'NOUN']]

From these sentences we want to extract human action phrases and noun phrases, which are defined as follows, using regex-like notation:

human_action = ("he"|"she"|"i"|"they"|"we") ([VERB] [ADP])+ 
noun_phrase = [DET]? ([ADJ] [NOUN])+

Translated to English this means that human actions are defined as 1st and 3rd person, singular and plural pronouns followed by repeated groups of verbs and adpositions (in, to, during). Noun phrases are composed of an optional determiner (a, an, the) followed by repeated groups of adjectives and nouns.

Most standard regex libraries won’t help you with this, because they work only on strings. But this problem is still perfectly well described by regular grammars, so after a bit of Googling I found REfO and it’s super simple to use, albeit you have to read the source code, because it doesn’t really have documentation.

REfO is a bit more verbose than normal regular expressions, but at least it tries to stay close to usual regex notions. Lazy repetition (*) is done using the refo.Star operator, while greedy one (+) is refo.Plus . The only new operator is refo.Predicate, which takes a function which takes a parameter and matches if that function returns true when called with the element at that position. Using this we will build the functions we need:

def pos(pos):    return refo.Predicate(lambda x: x[1] == pos) def humanpron():    return refo.Predicate(lambda x: x[1] == 'PRON' and x[0] in {'i', 'he', 'she', 'we', 'they'})

For matching POS, we use a helper to create a function that will match the given tag. For matching human pronouns, we also check the words, not just the POS tag.

np = refo.Question(pos('DET')) + refo.Plus(refo.Question(pos('ADJ')) + pos('NOUN'))
humanAction = humanpron() + refo.Plus(pos('VERB') + pos('ADP'))

Then we just compose our functions and concatenate them and we got what we wanted. Using them is simple. You either call refo.search, which finds the first match or refo.finditer which returns an iterable over all matches.

for match in refo.finditer(humanAction, s):   start = match.start()    end = match.end()    print(s[start:end])

[[u'i', u'PRON'], [u'look', u'VERB'], [u'around', u'ADP']]

So, it’s always good to Google around for a solution, because my first instict to whip up a parser in Parsec would have lead to a much more complicated solution. This is nice, elegant, short and efficient.

本文标签： python REfO模块使用入门

版权声明：本文标题：python REfO模块使用入门内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.freenas.com.cn/jishu/1732360624h1535110.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

技术交流 – FreeNAS中文网

python REfO模块使用入门

python REfO模块使用入门

python REfO模块使用入门

更多相关文章

python REfO模块使用入门

发表评论

推荐文章

QByteArray插入char的合适姿势

【论文泛读】Joint Visual

怀着十分悲痛的心情沉痛哀悼那我误充的100元

深度学习技术应用于蚁群算法的优化和建模

win11打开安全中心显示英文怎么办 windows11打开安全中心显示英文的解决方法

热门文章

Win7能安装ps2020吗？Windows7安装哪个版本的ps

编程基于栈的操作实现将任意一个非负十进制数，打印输出与其等值的八进制数

电脑qq微信等软件可以上网，浏览器无法上网，电脑浏览器提示代理服务器连接失败

计算机网络朱春燕,【计算机专业论文】计算机专业实践教学研究(共4458字)

虚拟机 VMware 安装 Windows2000 （iso 光盘镜像）

良心无广，这5款才是你电脑上该装的神仙软件，很多人都不知道

在 Windows 上安装 scoop

用MDT 2012为企业部署windows 7（十三）--结合WDS部署部署windows 7客户端

等保测评安全计算环境之 windows操作系统

实现Linux(Ubuntu22.04)与Windows文件互通共享（双方永久往来～）

最新文章

Raid技术

LSI_阵列卡操作手册

破解Centos7_root用户密码

Redhat重置Root用户密码方法

远程批量修改linux服务器密码的脚本

苹果电脑windows系统换苹果系统

Win11系统崩溃错误修复指南：三种实用方法详解

如何封装一个自己的win7系统并安装到电脑做成双系统

如何在Excel 2019中开启数据分析工具？

批量激活管理工具VAMT 3.0的安装与基本使用方法简介

技术交流 – FreeNAS中文网

python REfO模块使用入门

python REfO模块使用入门

python REfO模块使用入门

更多相关文章

python REfO模块使用入门

发表评论

推荐文章

QByteArray插入char的合适姿势

【论文泛读】Joint Visual

怀着十分悲痛的心情沉痛哀悼那我误充的100元

深度学习技术应用于蚁群算法的优化和建模

win11打开安全中心显示英文怎么办 windows11打开安全中心显示英文的解决方法

热门文章

Win7能安装ps2020吗？Windows7安装哪个版本的ps

编程基于栈的操作实现将任意一个非负十进制数，打印输出与其等值的八进制数

电脑qq微信等软件可以上网，浏览器无法上网，电脑浏览器提示代理服务器连接失败

计算机网络朱春燕,【计算机专业论文】计算机专业实践教学研究(共4458字)

虚拟机 VMware 安装 Windows2000 （iso 光盘镜像）

良心无广，这5款才是你电脑上该装的神仙软件，很多人都不知道

在 Windows 上安装 scoop

用MDT 2012为企业部署windows 7（十三）--结合WDS部署部署windows 7客户端

等保测评 安全计算环境 之 windows操作系统

实现Linux(Ubuntu22.04)与Windows文件互通共享（双方永久往来～）

最新文章

Raid技术

LSI_阵列卡操作手册

破解Centos7_root用户密码

Redhat重置Root用户密码方法

远程批量修改linux服务器密码的脚本

苹果电脑windows系统换苹果系统

Win11系统崩溃错误修复指南：三种实用方法详解

如何封装一个自己的win7系统并安装到电脑做成双系统

如何在Excel 2019中开启数据分析工具？

批量激活管理工具VAMT 3.0的安装与基本使用方法简介

等保测评安全计算环境之 windows操作系统