Python单词标记化

LANGUAGE REFERENCE

热门关键字： php net javascript java

Python数据科学专题

专题目录

第1章 Python数据科学简介

第2章 Python数据科学开发环境

第3章 Python Pandas库

第4章 Python Numpy库

第5章 Python Scipy库

第6章 Python Matplotlib库

第7章 Python数据操作

第8章 Python数据清理

第9章 Python处理CSV数据

第10章 Python处理Json数据

第11章 Python处理Excel数据

第12章 Python关系数据库

第13章 Python NoSQL数据库

第14章 Python日期和时间

第15章 Python数据噪音

第16章 Python数据聚合

第17章 Python读取HTML页面

第18章 Python处理非结构数据

第19章 Python单词标记化

第20章 Python词干与词形化

第21章 Python图表属性

第22章 Python图表样式

第23章 Python箱形图

第24章 Python热图

第25章 Python散点图

第26章 Python气泡图

第27章 Python 3D图

第28章 Python时间序列

第29章 Python时间序列

第30章 Python图数据

第31章 Python衡量中心趋势

第32章 Python测量方差

第33章 Python正态分布

第34章 Python二项分布

第35章 Python泊松分布

第36章 Python伯努利分布

第37章 Python P值

第38章 Python关联

第39章 Python卡方检验

第40章 Python线性回归

您的位置：python > Python数据科学专题 > Python单词标记化

Python单词标记化

作者：-- 发布时间：2019-11-20

单词标记是将大量文本样本分解为单词的过程。这是自然语言处理任务中的一项要求，每个单词需要被捕获并进行进一步的分析，如对特定情感进行分类和计数等。自然语言工具包(nltk)是用于实现这一目的的库。在继续使用python程序进行字词标记之前，先安装nltk。

conda install -c anaconda nltk

接下来，使用word_tokenize方法将段落拆分为单个单词。

import nltk

word_data = "it originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
nltk_tokens = nltk.word_tokenize(word_data)
print (nltk_tokens)

当我们执行上面的代码时，它会产生以下结果。

['it', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'readers', 
'who', 'prefer', 'learning', 'new', 'skills', 'from', 'the',
'comforts', 'of', 'their', 'drawing', 'rooms']

标记句子

还可以在段落中标记句子，就像标记单词一样。使用send_tokenize方法来实现这一点。下面是一个例子。

import nltk
sentence_data = "sun rises in the east. sun sets in the west."
nltk_tokens = nltk.sent_tokenize(sentence_data)
print (nltk_tokens)

当我们执行上面的代码时，它会产生以下结果。

['sun rises in the east.', 'sun sets in the west.']

滚动到顶部

滚动到底部

友情链接

Mozilla.orgDBProxyClusterLabsLINUX公社Linux-ha

网站声明：
本站部分内容来自网络，如您发现本站内容
侵害到您的利益，请联系本站管理员处理。

联系站长
373515719@qq.com

关于本站：
编程参考手册