slowrake() is written mostly in R and hence is slower than the Java-based implementation of RAKE found in the
rapidraker R package. You can speed up
slowrake() by telling it not to use a word’s part-of-speech (POS) when creating candidate keywords (i.e., set
stop_pos = NULL).
slowrake()is erroring from some memory issue (OutOfMemoryError). How do I fix this?
slowrake() relies on Java for POS tagging. Your Java virtual machine (JVM) may run out of memory during this process, resulting in an OutOfMemoryError. To fix this, try giving Java more memory:
options(java.parameters = "-Xmx1024m")
To quote from XLConnect:
[java.parameters] are evaluated exactly once per R session when the JVM is initialized - this is usually once you load the first package that uses Java support, so you should do this as early as possible.
Also note that:
The upper limit of the
Xmxparameter is system dependent - most prominently, 32bit Windows will fail to work with anything much larger than 1500m, and it is usually a bad idea to set
Xmxlarger than your physical memory size because garbage collection and virtual memory do not play well together.
java.parameters doesn’t help, you can always tell
slowrake() to skip POS tagging by setting
Each keyword’s score is calculated by summing up the scores of all of its member words. For example, the score for the keyword “dog leash” is calculated by adding the score for the word “dog” with the score for the word “leash.” This means that longer keywords will usually have higher scores than shorter ones.
slowrake()appears to be incorrect. What can I do about this?
First, confirm that the tagging function used by
get_pos_tags()) is indeed giving the wrong tags. To do that, try something like:
slowraker:::get_pos_tags("some text whose POS I want", word_token_annotator = openNLP::Maxent_Word_Token_Annotator(), pos_annotator = openNLP::Maxent_POS_Tag_Annotator()). If the returned tags are indeed incorrect, you could try to use a different function for POS tagging. Note,
get_pos_tags() is basically a wrapper around the POS tagging methods found in the
NLP packages, so you’ll want to look outside those libraries for a different tagger.