composeml.LabelTimes.bin

LabelTimes.bin(bins, quantiles=False, labels=None, right=True)[source]

Bin labels into discrete intervals.

Parameters:
  • bins (int or array) –

    The criteria to bin by.

    • bins (int) : Number of bins either equal-width or quantile-based.
      If quantiles is False, defines the number of equal-width bins. The range is extended by .1% on each side to include the minimum and maximum values. If quantiles is True, defines the number of quantiles (e.g. 10 for deciles, 4 for quartiles, etc.)
    • bins (array) : Bin edges either user defined or quantile-based.
      If quantiles is False, defines the bin edges allowing for non-uniform width. No extension is done. If quantiles is True, defines the bin edges usings an array of quantiles (e.g. [0, .25, .5, .75, 1.] for quartiles)
  • quantiles (bool) – Determines whether to use a quantile-based discretization function.
  • labels (array) – Specifies the labels for the returned bins. Must be the same length as the resulting bins.
  • right (bool) – Indicates whether bins includes the rightmost edge or not. Does not apply to quantile-based bins.
Returns:

Instance of labels.

Return type:

LabelTimes

Examples

Using bins of equal-widths:

>>> labels.bin(2).head(2).T
label_id                                0                    1
customer_id                             1                    1
cutoff_time           2014-01-01 00:45:00  2014-01-01 00:48:00
my_labeling_function      (157.5, 283.46]      (31.288, 157.5]

Using bins of custom-widths:

>>> values = labels.bin([0, 200, 400])
>>> values.head(2).T
label_id                                0                    1
customer_id                             1                    1
cutoff_time           2014-01-01 00:45:00  2014-01-01 00:48:00
my_labeling_function           (200, 400]             (0, 200]

Using quantile-based bins:

>>> values = labels.bin(4, quantiles=True) # (i.e. quartiles)
>>> values.head(2).T
label_id                                0                    1
customer_id                             1                    1
cutoff_time           2014-01-01 00:45:00  2014-01-01 00:48:00
my_labeling_function    (137.44, 241.062]     (43.848, 137.44]

Assigning labels to bins:

>>> values = labels.bin(3, labels=['low', 'medium', 'high'])
>>> values.head(2).T
label_id                                0                    1
customer_id                             1                    1
cutoff_time           2014-01-01 00:45:00  2014-01-01 00:48:00
my_labeling_function                 high                  low