Python:wordcloud.wordcloud()函数的参数解析及其说明

Python:wordcloud.wordcloud()函数的参数解析及其说明

目录

wordcloud.wordcloud()函数的参数解析及其说明


wordcloud.wordcloud()函数的参数解析及其说明

class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object):
    """Word cloud object for generating and drawing.
    
    Parameters
    ----------
    font_path: string
    Font path to the font that will be used (OTF or TTF).
    Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.
    
    width : int (default=400)
    Width of the canvas.
    
    height : int (default=200)
    Height of the canvas.
    
    prefer_horizontal : float (default=0.90)
    The ratio of times to try horizontal fitting as opposed to vertical.  If prefer_horizontal < 1, the algorithm will try rotating the word   if it doesn't fit. (There is currently no built-in way to get only vertical words.)
    
    mask : nd-array or None (default=None)
    If not None, gives a binary mask on where to draw words. If mask  is not  None, width and height will be ignored and the shape of mask  will be used instead. All white (#FF or #FFFFFF) entries will be considerd   "masked out" while other entries will be free to draw on. [This  changed in the most recent version!]
    
    scale : float (default=1)
    Scaling between computation and drawing. For large word-cloud   images,
    using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.
    
    min_font_size : int (default=4)
    Smallest font size to use. Will stop when there is no more room   in this  size.
    
    font_step : int (default=1)
    Step size for the font. font_step > 1 might speed up computation  but   give a worse fit.
    
    max_words : number (default=200)
    The maximum number of words.
    
    stopwords : set of strings or None
    The words that will be eliminated. If None, the build-in  STOPWORDS  list will be used.
    
    background_color : color value (default="black")
    Background color for the word cloud image.
    
    max_font_size : int or None (default=None)
    Maximum font size for the largest word. If None, height of the    image is used.
    
    mode : string (default="RGB")
    Transparent background will be generated when mode is "RGBA"  and  background_color is None.
    
    relative_scaling : float (default=.5)
    Importance of relative word frequencies for font-size.  With  relative_scaling=0, only word-ranks are considered.  With   relative_scaling=1, a word that is twice as frequent will have twice the size.  If you want to consider the word frequencies and not  only  their rank, relative_scaling around .5 often looks good.
    
    .. versionchanged: 2.0
    Default is now 0.5.
    
    color_func: callable, default=None
    Callable with parameters word, font_size, position, orientation,  font_path, random_state that returns a PIL color for each word.
    Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.
    
    regexp : string or None (optional)
    Regular expression to split the input text into tokens in   process_text.
    If None is specified, ``r"\w[\w']+"`` is used.
    
    collocations : bool, default=True
    Whether to include collocations (bigrams) of two words.
    
    .. versionadded: 2.0
    
    colormap : string or matplotlib colormap, default="viridis"
    Matplotlib colormap to randomly draw colors from for each   word.
    Ignored if "color_func" is specified.
    
    .. versionadded: 2.0
    
    normalize_plurals : bool, default=True
    Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's'  is removed and its counts are added to the version without  trailing 's' -- unless the word ends with 'ss'.
    

类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象):
用于生成和绘制的Word云对象。

参数
----------
font_path:字符串
要使用的字体(OTF或TTF)的字体路径。
Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体,你需要调整这个路径。

width :int(默认=400)
画布的宽度。

height :int(默认=200)
画布的高度。

prefer_horizontal : float(默认=0.90)
尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1,算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)

mask : nd-array或None(默认=None)
如果没有,给出一个二进制掩码在哪里绘制单词。如果遮罩不是None,宽度和高度将被忽略,而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”,而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]

scale :浮动(默认=1)
在计算和绘图之间缩放。对于大的字云图像,
使用scale而不是更大的画布尺寸会快得多,但可能会导致适合文字的粗化。

min_font_size : int(默认=4)
使用的最小字体大小。将停止时,没有更多的空间在这个大小。

font_step : int(默认=1)
字体的步长。font_step > 1可能会加速计算,但是匹配效果更差。

max_words :数字(默认=200)
单词的最大数量

stopwords :一组字符串或没有
将被删除的单词。如果没有,将使用内置的STOPWORDS列表

background_color :颜色值(默认=“黑色”)
背景色为字云图像

max_font_size : int或None(默认=None)
为最大的字的最大字体大小。如果没有,则使用图像的高度。

mode :string(默认="RGB")
当模式为“RGBA”,background_color为None时,将生成透明背景。

relative_scaling :浮动(默认= 5)
字体大小的相对频率的重要性。对于relative_scaling=0,只考虑单词的等级。使用relative_scaling=1,出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名,那么在5左右的relative_scaling通常看起来不错。

. .versionchanged: 2.0
现在默认值是0.5。

color_func:可调用,默认=无
可调用参数word, font_size, position, orientation, font_path, random_state,为每个单词返回一个PIL颜色。
覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。

regexp :字符串或无(可选)
正则表达式,用于在process_text中将输入文本分割为令牌。
如果没有指定,“r”\ w (\ w) +”“使用。
&
collocations :bool, default=True
是否包含两个单词的搭配(双字母组合)

. .versionadded: 2.0

colormap : string或matplotlib colormap,默认="viridis"
Matplotlib colormap为每个单词随机绘制颜色。
如果指定了“color_func”,则忽略。

. .versionadded: 2.0

normalize_plurals : bool, default=True
是否删除单词后面的“s”。如果是真的,并且一个单词出现时带有或不带有结尾s,那么带有结尾s的单词将被删除,并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。
    Attributes
    ----------
    ``words_`` : dict of string to float
    Word tokens with associated frequency.
    
    .. versionchanged: 2.0
    ``words_`` is now a dictionary
    
    ``layout_ `` : list of tuples (string, int, (int, int), int, color))
    Encodes the fitted word cloud. Encodes for each word the string,   font size, position, orientation and color.
    
    Notes
    -----
    Larger canvases with make the code significantly slower. If you   need a  large word cloud, try a lower canvas size, and set the scale  parameter.
    
    The algorithm might give more weight to the ranking of the words  than their actual frequencies, depending on the ``max_font_size `   and the scaling heuristic.
    """
属性
---------
' ' words_ ' ':浮动字符串的dict
具有相关频率的单词标记。

. .versionchanged: 2.0
words_”现在是一本字典

' ' layout_ ' ':元组列表(字符串,int, (int, int), int, color))
编码合适的词云。为每个单词编码字符串、字体大小、位置、方向和颜色。

笔记
-----
较大的画布使代码明显地变慢。如果你需要一个大的字云,尝试一个较低的画布大小,并设置比例参数。

根据' ' max_font_size '和缩放启发式,算法可能给予单词的排名比它们的实际频率更多的权重。
”“”

    def __init__(self, font_path=None, width=400, height=200, 
     margin=2, 
        ranks_only=None, prefer_horizontal=.9, mask=None, scale=1, 
        color_func=None, max_words=200, min_font_size=4, 
        stopwords=None, random_state=None, 
         background_color='black', 
        max_font_size=None, font_step=1, mode="RGB", 
        relative_scaling=.5, regexp=None, collocations=True, 
        colormap=None, normalize_plurals=True):
        if font_path is None:
            font_path = FONT_PATH
        if color_func is None and colormap is None:
            # we need a color map
            import matplotlib
            version = matplotlib.__version__
            if version[0] < "2" and version[2] < "5":
                colormap = "hsv"
            else:
                colormap = "viridis"
        self.colormap = colormap
        self.collocations = collocations
        self.font_path = font_path
        self.width = width
        self.height = height
        self.margin = margin
        self.prefer_horizontal = prefer_horizontal
        self.mask = mask
        self.scale = scale
        self.color_func = color_func or colormap_color_func(colormap)
        self.max_words = max_words
        self.stopwords = stopwords if stopwords is not None else 
         STOPWORDS
        self.min_font_size = min_font_size
        self.font_step = font_step
        self.regexp = regexp
        if isinstance(random_state, int):
            random_state = Random(random_state)
        self.random_state = random_state
        self.background_color = background_color
        self.max_font_size = max_font_size
        self.mode = mode
        if relative_scaling < 0 or relative_scaling > 1:
            raise ValueError(
                "relative_scaling needs to be "
                "between 0 and 1, got %f." % 
                relative_scaling)
        self.relative_scaling = relative_scaling
        if ranks_only is not None:
            warnings.warn("ranks_only is deprecated and will be 
             removed as"
                " it had no effect. Look into relative_scaling.", 
                DeprecationWarning)
        self.normalize_plurals = normalize_plurals
    
    def fit_words(self, frequencies):
        """Create a word_cloud from words and frequencies.

        Alias to generate_from_frequencies.

        Parameters
        ----------
        frequencies : dict from string to float
            A contains words and associated frequency.

        Returns
        -------
        self
        """
        return self.generate_from_frequencies(frequencies)
    
    def generate_from_frequencies(self, frequencies, 
     max_font_size=None):
        """Create a word_cloud from words and frequencies. Parameters

        ----------
        frequencies : dict from string to float
            A contains words and associated frequency.

        max_font_size : int
            Use this font-size instead of self.max_font_size

        Returns
        -------
        self

        """
        # make sure frequencies are sorted and normalized
        frequencies = sorted(frequencies.items(), key=itemgetter(1), 
         reverse=True)
        if len(frequencies) <= 0:
            raise ValueError("We need at least 1 word to plot a word 
             cloud, "
                "got %d." % 
                len(frequencies))
        frequencies = frequencies[:self.max_words] # largest entry will 
         be 1
        max_frequency = float(frequencies[0][1])
        frequencies = [(word, freq / max_frequency) for 
            word, freq in frequencies]
        if self.random_state is not None:
            random_state = self.random_state
        else:
            random_state = Random()
        if self.mask is not None:
            mask = self.mask
            width = mask.shape[1]
            height = mask.shape[0]
            if mask.dtype.kind == 'f':
                warnings.warn("mask image should be unsigned byte 
                 between 0"
                    " and 255. Got a float array")
            if mask.ndim == 2:
                boolean_mask = mask == 255
            elif mask.ndim == 3: # if all channels are white, mask out
                :::3]255, axis=-1)
        else:
            boolean_mask = np.all(mask[ == 
                raise ValueError("Got mask of invalid shape: %s" % 
                    str(mask.shape))
        else:
            boolean_mask = None
            height, width = self.height, self.width
        occupancy = IntegralOccupancyMap(height, width, 
         boolean_mask)
        # create image
        img_grey = Image.new("L", (width, height))
        draw = ImageDraw.Draw(img_grey)
        img_array = np.asarray(img_grey)
        font_sizes, positions, orientations, colors = [], [], [], []
        last_freq = 1.
        if max_font_size is None:
            # if not provided use default font_size
            max_font_size = self.max_font_size
        if max_font_size is None:
            # figure out a good font size by trying to draw with
            # just the first two words
            if len(frequencies) == 1:
                # we only have one word. We make it big!
                font_size = self.height
            else:
                self.generate_from_frequencies(dict(frequencies[:2]), 
                    max_font_size=self.height)
                # find font sizes
                sizes = [x[1] for x in self.layout_]
                try:
                    font_size = int(2 * sizes[0] * sizes[1] / 
                        (sizes[0] + sizes[1]))
                # quick fix for if self.layout_ contains less than 2 values
                # on very small images it can be empty
                except IndexError:
                    try:
                        font_size = sizes[0]
                    except IndexError:
                        raise ValueError('canvas size is too small')
        else:
            font_size = max_font_size
        # we set self.words_ here because we called 
         generate_from_frequencies
        # above... hurray for good design?
        self.words_ = dict(frequencies)
        # start drawing grey image
        for word, freq in frequencies:
            # select the font size
            rs = self.relative_scaling
            if rs != 0:
                font_size = int(round((rs * (freq / float(last_freq)) + 
                            (1 - rs)) * font_size))
            if random_state.random() < self.prefer_horizontal:
                orientation = None
            else:
                orientation = Image.ROTATE_90
            tried_other_orientation = False
            while True:
                # try to find a position
                font = ImageFont.truetype(self.font_path, font_size)
                # transpose font optionally
                transposed_font = ImageFont.TransposedFont(
                    font, orientation=orientation)
                # get size of resulting text
                box_size = draw.textsize(word, font=transposed_font)
                # find possible places using integral image:
                result = occupancy.sample_position(box_size[1] + self.
                 margin, 
                    box_size[0] + self.margin, 
                    random_state)
                if result is not None or font_size < self.min_font_size:
                    # either we found a place or font-size went too small
                    break
                # if we didn't find a place, make font smaller
                # but first try to rotate!
                if not tried_other_orientation and self.prefer_horizontal < 
                 1:
                    orientation = Image.ROTATE_90 if orientation is None 
                     else Image.ROTATE_90
                    tried_other_orientation = True
                else:
                    font_size -= self.font_step
                    orientation = None
            
            if font_size < self.min_font_size:
                # we were unable to draw any more
                break
            x, y = np.array(result) + self.margin // 2
            # actually draw the text
            draw.text((y, x), word, fill="white", font=transposed_font)
            positions.append((x, y))
            orientations.append(orientation)
            font_sizes.append(font_size)
            colors.append(self.color_func(word, font_size=font_size, 
                    position=(x, y), 
                    orientation=orientation, 
                    random_state=random_state, 
                    font_path=self.font_path))
            # recompute integral image
            if self.mask is None:
                img_array = np.asarray(img_grey)
            else:
                img_array = np.asarray(img_grey) + boolean_mask
            # recompute bottom right
            # the order of the cumsum's is important for speed ?!
            occupancy.update(img_array, x, y)
            last_freq = freq
        
        self.layout_ = list(zip(frequencies, font_sizes, positions, 
                orientations, colors))
        return self
    
    def process_text(self, text):
        """Splits a long text into words, eliminates the stopwords.

        Parameters
        ----------
        text : string
            The text to be processed.

        Returns
        -------
        words : dict (string, int)
            Word tokens with associated frequency.

        ..versionchanged:: 1.2.2
            Changed return type from list of tuples to dict.

        Notes
        -----
        There are better ways to do word tokenization, but I don't 
         want to
        include all those things.
        """
        stopwords = set([i.lower() for i in self.stopwords])
        flags = re.UNICODE if sys.version < '3' and type(text) is unicode 
         else 0
        regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
        words = re.findall(regexp, text, flags)
        # remove stopwords
        words = [word for word in words if word.lower() not in 
         stopwords]
        # remove 's
        words = [word[:-2] if word.lower().endswith("'s") else word for 
            word in words]
        # remove numbers
        words = [word for word in words if not word.isdigit()]
        if self.collocations:
            word_counts = unigrams_and_bigrams(words, self.
             normalize_plurals)
        else:
            word_counts, _ = process_tokens(words, self.
             normalize_plurals)
        return word_counts
    
    def generate_from_text(self, text):
        """Generate wordcloud from text.

        The input "text" is expected to be a natural text. If you pass a 
         sorted
        list of words, words will appear in your output twice. To 
         remove this
        duplication, set ``collocations=False``.

        Calls process_text and generate_from_frequencies.

        ..versionchanged:: 1.2.2
            Argument of generate_from_frequencies() is not return of
            process_text() any more.

        Returns
        -------
        self
        """
        words = self.process_text(text)
        self.generate_from_frequencies(words)
        return self
    
    def generate(self, text):
        """Generate wordcloud from text.

        The input "text" is expected to be a natural text. If you pass a 
         sorted
        list of words, words will appear in your output twice. To 
         remove this
        duplication, set ``collocations=False``.

        Alias to generate_from_text.

        Calls process_text and generate_from_frequencies.

        Returns
        -------
        self
        """
        return self.generate_from_text(text)
    
    def _check_generated(self):
        """Check if ``layout_`` was computed, otherwise raise error."""
        if not hasattr(self, "layout_"):
            raise ValueError("WordCloud has not been calculated, call 
             generate"
                " first.")
    
    def to_image(self):
        self._check_generated()
        if self.mask is not None:
            width = self.mask.shape[1]
            height = self.mask.shape[0]
        else:
            height, width = self.height, self.width
        img = Image.new(self.mode, (int(width * self.scale), 
                int(height * self.scale)), 
            self.background_color)
        draw = ImageDraw.Draw(img)
        for (word, count), font_size, position, orientation, color in self.
         layout_:
            font = ImageFont.truetype(self.font_path, 
                int(font_size * self.scale))
            transposed_font = ImageFont.TransposedFont(
                font, orientation=orientation)
            pos = int(position[1] * self.scale), int(position[0] * self.scale)
            draw.text(pos, word, fill=color, font=transposed_font)
        
        return img
    
    def recolor(self, random_state=None, color_func=None, 
     colormap=None):
        """Recolor existing layout.

        Applying a new coloring is much faster than generating the 
         whole
        wordcloud.

        Parameters
        ----------
        random_state : RandomState, int, or None, default=None
            If not None, a fixed random state is used. If an int is given, 
             this
            is used as seed for a random.Random state.

        color_func : function or None, default=None
            Function to generate new color from word count, font size, 
             position
            and orientation.  If None, self.color_func is used.

        colormap : string or matplotlib colormap, default=None
            Use this colormap to generate new colors. Ignored if 
             color_func
            is specified. If None, self.color_func (or self.color_map) is 
             used.

        Returns
        -------
        self
        """
        if isinstance(random_state, int):
            random_state = Random(random_state)
        self._check_generated()
        if color_func is None:
            if colormap is None:
                color_func = self.color_func
            else:
                color_func = colormap_color_func(colormap)
        self.layout_ = [(word_freq, font_size, position, orientation, 
                color_func(word=word_freq[0], font_size=font_size, 
                    position=position, orientation=orientation, 
                    random_state=random_state, 
                    font_path=self.font_path)) for 
            word_freq, font_size, position, orientation, _ in 
            self.layout_]
        return self
    
    def to_file(self, filename):
        """Export to image file.

        Parameters
        ----------
        filename : string
            Location to write to.

        Returns
        -------
        self
        """
        img = self.to_image()
        img.save(filename, optimize=True)
        return self
    
    def to_array(self):
        """Convert to numpy array.

        Returns
        -------
        image : nd-array size (width, height, 3)
            Word cloud image as numpy matrix.
        """
        return np.array(self.to_image())
    
    def __array__(self):
        """Convert to numpy array.

        Returns
        -------
        image : nd-array size (width, height, 3)
            Word cloud image as numpy matrix.
        """
        return self.to_array()
    
    def to_html(self):
        raise NotImplementedError("FIXME!!!")

 

猜你喜欢

转载自blog.csdn.net/qq_41185868/article/details/107703213