python 文本单词提取和词频统计的实例-StartMVC超轻量PHP框架

这些对文本的操作经常用到，那我就总结一下。陆续补充。。。

操作：

strip_html(cls, text) 去除html标签

separate_words(cls, text, min_lenth=3) 文本提取

get_words_frequency(cls, words_list) 获取词频

源码：


class DocProcess(object):

 @classmethod
 def strip_html(cls, text):
 """
 Delete html tags in text.
 text is String
 """
 new_text = " "
 is_html = False
 for character in text:
 if character == "<":
 is_html = True
 elif character == ">":
 is_html = False
 new_text += " "
 elif is_html is False:
 new_text += character
 return new_text

 @classmethod
 def separate_words(cls, text, min_lenth=3):
 """
 Separate text into words in list.
 """
 splitter = re.compile("\\W+")
 return [s.lower() for s in splitter.split(text) if len(s) > min_lenth]

 @classmethod
 def get_words_frequency(cls, words_list):
 """
 Get frequency of words in words_list.
 return a dict.
 """
 num_words = {}
 for word in words_list:
 num_words[word] = num_words.get(word, 0) + 1
 return num_words

以上这篇python 文本单词提取和词频统计的实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持脚本之家。

python 文本单词词频

python

python 文本单词提取和词频统计的实例

文章分类

相关文章