Text pre-processing package to aid in NLP package development for Python3. With this package you can order text cleaning functions in the order you prefer rather than relying on the order of an arbitrary NLP package.
pip install preprocessing
PyPI - You can also download the source distribution from:
You can then perform:
pip install <path_to_tar_file>
on the tar file, or
python setup.py install
on/inside, respectively, the extracted package to install preprocessing.
Once you have the package installed, implementing it with Python3 takes the following form:
import preprocessing.text as ptext from preprocessing.text import keyword_tokenize, remove_unbound_punct, remove_urls text_string = "important string at: http://example.com" clean_string = ptext.preprocess_text(text_string, [ remove_urls, remove_unbound_punct, keyword_tokenize ])
>>> print(clean_string) "important string"
Should the functions be performed in a different order (i.e. keyword_tokenize -> remove_urls -> remove_non_bound_punct) :
>>> print(clean_string) "important string http example.com"
This package is comprised of a single module with no intended subpackages currently. The preprocessing package is dependent on NLTK for tokenizers and stopwords. However, ignoring this, the package only has built-in dependencies from Python 3.
If you feel like contributing: