Release Notes¶
v1.7.0 Mar 16, 2022¶
- Fixes
Updated the conda install commands to specify the channel (GH#1917)
- Changes
Update error message when DFS returns an empty list of features (GH#1919)
Remove
list_variable_types
and related directories (GH#1929)Transition to use pyproject.toml and setup.cfg (moving away from setup.py) (GH#1941, GH#1950, GH#1952, GH#1954, GH#1957, GH#1964)
Replace Koalas with pandas API on Spark (GH#1949)
- Testing Changes
Update test cases to cover __main__.py file (GH#1927)
Add Python 3.9 linting, install complete, and docs build CI tests (GH#1934)
Add CI workflow to test with latest woodwork main branch (GH#1936)
Add lower bound for wheel for minimum dependency checker and limit lint CI tests to Python 3.10 (GH#1945)
Fix non-deterministic test in
test_es.py
(GH#1961)Thanks to the following people for contributing to this release: @andriyor, @gsheni, @jeff-hernandez, @kushal-gopal, @mingdavidqi, @rwedge, @tamargrey, @thehomebrewnerd, @tvdboom
Breaking Changes¶
The deprecated utility
list_variable_types
has been removed from Featuretools.
v1.6.0 Feb 17, 2022¶
- Enhancements
Add
IsFederalHoliday
transform primitive (GH#1912)
- Fixes
Fix to catch new
NotImplementedError
raised byholidays
library for unknown country (GH#1907)
- Changes
Remove outdated pandas workaround code (GH#1906)
- Documentation Changes
Add in-line tabs and copy-paste functionality to docs (GH#1905)
- Testing Changes
Fix URL deserialization file (GH#1909)
Thanks to the following people for contributing to this release: @jeff-hernandez, @rwedge, @thehomebrewnerd
v1.5.0 Feb 14, 2022¶
Warning
Featuretools may not support Python 3.7 in next non-bugfix release.
- Fixes
Fix
featuretools_primitives
entry point (GH#1891)
- Changes
Allow only snake camel and title case for primitives (GH#1854)
Add autonormalize as an add-on library (GH#1840)
Add DateToHoliday Transform Primitive (GH#1848)
Add DistanceToHoliday Transform Primitive (GH#1853)
Temporarily restrict pandas and koalas max versions (GH#1863)
Add
__setitem__
method to overloadadd_dataframe
method on EntitySet (GH#1862)Split Datetime and LatLong primitives into separate files (GH#1861)
Null values will not be included in index of normalized dataframe (GH#1897)
- Testing Changes
Add check for package conflicts with install workflow (GH#1843)
Change auto approve workflow to use assignee (GH#1843)
Update auto approve workflow to delete branch and change on trigger (GH#1852)
Upgrade tests to use compose version 0.8.0 (GH#1856)
Updated deep feature synthesis and feature serialization tests to use new primitive files (GH#1861)
Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jacobboney, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Breaking Changes¶
When using
normalize_dataframe
to create a new dataframe, the new dataframe’s index will not include a null value.
v1.4.0 Jan 10, 2022¶
- Fixes
Fix bug where Woodwork initialization could fail on feature matrix if cutoff times caused null values to be introduced (GH#1810)
- Documentation Changes
Remove testing on conda forge in release.md (GH#1811)
- Testing Changes
Enable auto-merge for minimum and latest dependency merge requests (GH#1818, GH#1821, GH#1822)
Change auto approve workfow to use PR number and run every 30 minutes (GH#1827)
Add auto approve workflow to run when unit tests complete (GH#1837)
Test deserializing from S3 with mocked S3 fixtures only (GH#1825)
Remove fastparquet as a test requirement (GH#1833)
Thanks to the following people for contributing to this release: @davesque, @gsheni, @rwedge, @thehomebrewnerd
v1.3.0 Dec 2, 2021¶
- Enhancements
Add
NumericLag
transform primitive (GH#1797)
- Changes
Update pip to 21.3.1 for test requirements (GH#1789)
Thanks to the following people for contributing to this release: @gsheni, @HenryRocha, @tamargrey @thehomebrewnerd
v1.2.0 Nov 15, 2021¶
- Enhancements
Add Rolling Transform primitives with integer parameters (GH#1770)
- Fixes
Handle new graphviz FORMATS import (GH#1770)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey
v1.1.0 Nov 2, 2021¶
- Fixes
Check
base_of_exclude
attribute on primitive instead feature class (GH#1749)Pin upper bound for pyspark (GH#1748)
Fix
get_unused_primitives
only recognizes lowercase primitive strings (GH#1733)Require newer versions of dask and distributed (GH#1762)
Fix bug with pass-through columns of cutoff_time df when n_jobs > 1 (GH#1765)
- Documentation Changes
Upgrade Sphinx and fix docs configuration error (GH#1760)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @HenryRocha, @jeff-hernandez, @ridicolos, @rwedge
v1.0.0 Oct 12, 2021¶
- Enhancements
Add support for creating EntitySets from Woodwork DataTables (GH#1277)
Add
EntitySet.__deepcopy__
that retains Woodwork typing information (GH#1465)Add
EntitySet.__getstate__
andEntitySet.__setstate__
to preserve typing when pickling (GH#1581)Returned feature matrix has woodwork typing information (GH#1664)
- Fixes
Fix
DFSTransformer
Documentation for Featuretools 1.0 (GH#1605)Fix
calculate_feature_matrix
time type check andencode_features
for synthesis tests (GH#1580)Revert reordering of categories in
Equal
andNotEqual
primitives (GH#1640)Fix bug in
EntitySet.add_relationship
that causedforeign_key
tag to be lost (GH#1675)Update DFS to not build features on last time index columns in dataframes (GH#1695)
- Changes
Remove
add_interesting_values
fromEntity
(GH#1269)Move
set_secondary_time_index
method fromEntity
toEntitySet
(GH#1280)Refactor Relationship creation process (GH#1370)
Replaced
Entity.update_data
withEntitySet.update_dataframe
(GH#1398)Move validation check for uniform time index to
EntitySet
(GH#1400)Replace
Entity
objects inEntitySet
with Woodwork dataframes (GH#1405)Refactor
EntitySet.plot
to work with Woodwork dataframes (GH#1468)Move
last_time_index
to be a column on the DataFrame (GH#1456)Update serialization/deserialization to work with Woodwork (GH#1452)
Refactor
EntitySet.query_by_values
to work with Woodwork dataframes (GH#1467)Replace
list_variable_types
withlist_logical_types
(GH#1477)Allow deep EntitySet equality check (GH#1480)
Update
EntitySet.concat
to work with Woodwork DataFrames (GH#1490)Add function to list semantic tags (GH#1486)
Initialize Woodwork on feature matrix in
remove_highly_correlated_features
if necessary (GH#1618)Remove categorical-encoding as an add-on library (will be added back later) (GH#1632)
Remove autonormalize as an add-on library (will be added back later) (GH#1636)
Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (GH#1638)
Update input and return types for
CumCount
primitive (GH#1651)Standardize imports of Woodwork (GH#1526)
Rename target entity to target dataframe (GH#1506)
Replace
entity_from_dataframe
withadd_dataframe
(GH#1504)Create features from Woodwork columns (GH#1582)
Move default variable description logic to
generate_description
(GH#1403)Update Woodwork to version 0.4.0 with
LogicalType.transform
and LogicalType instances (GH#1451)Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (GH#1478)
Use
ColumnSchema
for primitive input and return types (GH#1411)Update features to use Woodwork and remove
Entity
andVariable
classes (GH#1501)Re-add
make_index
functionality to EntitySet (GH#1507)Use
ColumnSchema
in DFS primitive matching (GH#1523)Updates from Featuretools v0.26.0 (GH#1539)
Leverage Woodwork better in
add_interesting_values
(GH#1550)Update
calculate_feature_matrix
to use Woodwork (GH#1533)Update Woodwork to version 0.6.0 with changed categorical inference (GH#1597)
Update
nlp-primitives
requirement for Featuretools 1.0 (GH#1609)Remove remaining references to
Entity
andVariable
in code (GH#1612)Update Woodwork to version 0.7.1 with changed initialization (GH#1648)
Removes outdated workaround code related to a since-resolved pandas issue (GH#1677)
Remove unused
_dataframes_equal
andcamel_to_snake
functions (GH#1683)Update Woodwork to version 0.8.0 for improved performance (GH#1689)
Remove redundant typecasting in
encode_features
(GH#1694)Speed up
encode_features
if not inplace, some space cost (GH#1699)Clean up comments and commented out code (GH#1701)
Update Woodwork to version 0.8.1 for improved performance (GH#1702)
- Documentation Changes
Add a Woodwork Typing in Featuretools guide (GH#1589)
Add a resource guide for transitioning to Featuretools 1.0 (GH#1627)
Update
using_entitysets
page to use Woodwork (GH#1532)Update FAQ page to use Woodwork integration (GH#1649)
Update DFS page to be Jupyter notebook and use Woodwork integration (GH#1557)
Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (GH#1556)
Update Handling Time page to be Jupyter notebook and use Woodwork integration (GH#1552)
Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (GH#1587)
Update Deployment page to use Woodwork integration (GH#1588)
Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1590)
Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (GH#1593)
Update API Reference to match Featuretools 1.0 API (GH#1600)
Update Index page to be Jupyter notebook and use Woodwork integration (GH#1602)
Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (GH#1603)
Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1604)
Update Glossary to use Woodwork integration (GH#1608)
Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (GH#1610)
Fix small formatting issues in Documentation (GH#1607)
Remove Variables page and more references to variables (GH#1629)
Update Feature Selection page to use Woodwork integration (GH#1618)
Update Improving Performance page to be Jupyter notebook and use Woodwork integration (GH#1591)
Fix typos in transition guide (GH#1672)
Update installation instructions for 1.0.0rc1 announcement in docs (GH#1707, GH#1708, GH#1713, GH#1716)
Fixed broken link for Demo notebook in README.md (GH#1728)
Update
contributing.md
to improve instructions for external contributors (GH#1723)Manually revert changes made by GH#1677 and GH#1679. The related bug in pandas still exists. (GH#1731)
- Testing Changes
Remove entity tests (GH#1521)
Fix broken
EntitySet
tests (GH#1548)Fix broken primitive tests (GH#1568)
Added Jupyter notebook cleaner to the linters (GH#1719)
Update reviewers for minimum and latest dependency checkers (GH#1715)
Full coverage for EntitySet.__eq__ method (GH#1725)
Add tests to verify all primitives can be initialized without parameter values (GH#1726)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @HenryRocha, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @VaishnaviNandakumar
Breaking Changes¶
Entity.add_interesting_values
has been removed. To add interesting values for a single entity, callEntitySet.add_interesting_values
and pass the name of the dataframe for which to add interesting values in thedataframe_name
parameter (GH#1405, GH#1370).Entity.set_secondary_time_index
has been removed and replaced byEntitySet.set_secondary_time_index
with an addeddataframe_name
parameter to specify the dataframe on which to set the secondary time index (GH#1405, GH#1370).Relationship
initialization has been updated to accept four name values for the parent dataframe, parent column, child dataframe and child column instead of accepting twoVariable
objects (GH#1405, GH#1370).EntitySet.add_relationship
has been updated to accept dataframe and column name values or aRelationship
object. Adding a relationship from aRelationship
object now requires passing the relationship as a keyword argument (GH#1405, GH#1370).Entity.update_data
has been removed. To update the dataframe, callEntitySet.replace_dataframe
and use thedataframe_name
parameter (GH#1630, GH#1522).The data in an
EntitySet
is no longer stored inEntity
objects. Instead, dataframes with Woodwork typing information are used. Accordingly, most language referring to “entities” will now refer to “dataframes”, references to “variables” will now refer to “columns”, and “variable types” will use the Woodwork type system’s “logical types” and “semantic tags” (GH#1405).The dictionary of tuples passed to
EntitySet.__init__
has replaced thevariable_types
element with separatelogical_types
andsemantic_tags
dictionaries (GH#1405).EntitySet.entity_from_dataframe
no longer exists. To add new tables to an entityset, use``EntitySet.add_dataframe`` (GH#1405).EntitySet.normalize_entity
has been renamed toEntitySet.normalize_dataframe
(GH#1405).Instead of raising an error at
EntitySet.add_relationship
when the dtypes of parent and child columns do not match, Featuretools will now check whether the Woodwork logical type of the parent and child columns match. If they do not match, there will now be a warning raised, and Featuretools will attempt to update the logical type of the child column to match the parent’s (GH#1405).If no index is specified at
EntitySet.add_dataframe
, the first column will only be used as index if Woodwork has not been initialized on the DataFrame. When adding a dataframe that already has Woodwork initialized, if there is no index set, an error will be raised (GH#1405).Featuretools will no longer re-order columns in DataFrames so that the index column is the first column of the DataFrame (GH#1405).
Type inference can now be performed on Dask and Koalas dataframes, though a warning will be issued indicating that this may be computationally intensive (GH#1405).
EntitySet.time_type is no longer stored as Variable objects. Instead, Woodwork typing is used, and a numeric time type will be indicated by the
'numeric'
semantic tag string, and a datetime time type will be indicated by theDatetime
logical type (GH#1405).last_time_index
,secondary_time_index
, andinteresting_values
are no longer attributes of an entityset’s tables that can be accessed directly. Now they must be accessed through the metadata of the Woodwork DataFrame, which is a dictionary (GH#1405).The helper function
list_variable_types
will be removed in a future release and replaced bylist_logical_types
. In the meantime,list_variable_types
will return the same output aslist_logical_types
(GH#1447).
What’s New in this Release¶
Adding Interesting Values
To add interesting values for a single entity, call EntitySet.add_interesting_values
passing the
id of the dataframe for which interesting values should be added.
>>> es.add_interesting_values(dataframe_name='log')
Setting a Secondary Time Index
To set a secondary time index for a specific dataframe, call EntitySet.set_secondary_time_index
passing
the dataframe name for which to set the secondary time index along with the dictionary mapping the secondary time
index column to the for which the secondary time index applies.
>>> customers_secondary_time_index = {'cancel_date': ['cancel_reason']}
>>> es.set_secondary_time_index(dataframe_name='customers', customers_secondary_time_index)
Creating a Relationship and Adding to an EntitySet
Relationships are now created by passing parameters identifying the entityset along with four string values specifying the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional.
>>> new_relationship = Relationship(
... entityset=es,
... parent_dataframe_name='customers',
... parent_column_name='id',
... child_dataframe_name='sessions',
... child_column_name='customer_id'
... )
Relationships can now be added to EntitySets in one of two ways. The first approach is to pass in name values for the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional with this approach.
>>> es.add_relationship(
... parent_dataframe_name='customers',
... parent_column_name='id',
... child_dataframe_name='sessions',
... child_column_name='customer_id'
... )
Relationships can also be added by passing in a previously created Relationship
object. When using
this approach the relationship
parameter name must be included.
>>> es.add_relationship(relationship=new_relationship)
Replace DataFrame
To replace a dataframe in an EntitySet with a new dataframe, call EntitySet.replace_dataframe
and pass in the name of the dataframe to replace along with the new data.
>>> es.replace_dataframe(dataframe_name='log', df=df)
List Logical Types and Semantic Tags
Logical types and semantic tags have replaced variable types to parse and interpret columns. You can list all the available logical types by calling featuretools.list_logical_types
.
>>> ft.list_logical_types()
You can list all the available semantic tags by calling featuretools.list_semantic_tags
.
>>> ft.list_semantic_tags()
v0.27.1 Sep 2, 2021¶
- Documentation Changes
Add banner to docs about upcoming Featuretools 1.0 release (GH#1669)
Thanks to the following people for contributing to this release: @thehomebrewnerd
v0.27.0 Aug 31, 2021¶
- Changes
Remove autonormalize, tsfresh, nlp_primitives, sklearn_transformer, caegorical_encoding as an add-on libraries (will be added back later) (GH#1644)
Emit a warning message when a
featuretools_primitives
entrypoint throws an exception (GH#1662)Throw a
RuntimeError
when two primitives with the same name are encountered duringfeaturetools_primitives
entrypoint handling (GH#1662)Prevent the
featuretools_primitives
entrypoint loader from loading non-class objects as well as theAggregationPrimitive
andTransformPrimitive
base classes (GH#1662)Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge
v0.26.2 Aug 17, 2021¶
v0.26.1 Jul 23, 2021¶
v0.26.0 Jul 15, 2021¶
- Fixes
include_entities
correctly overridesexclude_entities
inprimitive_options
(GH#1518)
- Documentation Changes
Prevent logging on build (GH#1498)
- Testing Changes
Test featuretools on pandas 1.3.0 release candidate and make fixes (GH#1492)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
v0.25.0 Jun 11, 2021¶
v0.24.1 May 26, 2021¶
- Documentation Changes
Update nbsphinx version to fix docs build issue (GH#1436)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @thehomebrewnerd
v0.24.0 Apr 30, 2021¶
- Documentation Changes
Improve formatting of release notes (GH#1396)
- Testing Changes
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
v0.23.3 Mar 31, 2021¶
Warning
The next non-bugfix release of Featuretools will not support Python 3.6
Thanks to the following people for contributing to this release: @gsheni, @rwedge, @thehomebrewnerd
v0.23.2 Feb 26, 2021¶
Warning
The next non-bugfix release of Featuretools will not support Python 3.6
- Enhancements
The
list_primitives
function returns valid input types and the return type (GH#1341)
- Fixes
Restrict numpy version when installing koalas (GH#1329)
- Changes
Warn python 3.6 users support will be dropped in future release (GH#1344)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge
v0.23.1 Jan 29, 2021¶
- Documentation Changes
Update Twitter link to documentation toolbar (GH#1322)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @seriallazer, @thehomebrewnerd
v0.23.0 Dec 31, 2020¶
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @thehomebrewnerd
Breaking Changes¶
Entity.query_by_values
has been removed and replaced byEntitySet.query_by_values
with an addedentity_id
parameter to specify which entity in the entityset should be used for the query.
v0.22.0 Nov 30, 2020¶
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @kmax12, @rwedge, @thehomebrewnerd
v0.21.0 Oct 30, 2020¶
- Enhancements
Add
describe_feature
to generate an English language feature description for a given feature (GH#1201)
- Changes
Keep koalas requirements in separate file (GH#1195)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
v0.20.0 Sep 30, 2020¶
Warning
The Text variable type has been deprecated and been replaced with the NaturalLanguage variable type. The Text variable type will be removed in a future release.
- Fixes
Allow FeatureOutputSlice features to be serialized (GH#1150)
Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (GH#1160)
Determine calculate_feature_matrix behavior with approximate and a cutoff df that is a subclass of a pandas DataFrame (GH#1166)
- Changes
Text variable type has been replaced with NaturalLanguage (GH#1159)
Thanks to the following people for contributing to this release: @gsheni, @rwedge, @tamargrey, @tuethan1999
v0.19.0 Sep 8, 2020¶
- Documentation Changes
Added return values to dfs and calculate_feature_matrix (GH#1125)
- Testing Changes
Better test case for normalizing from no time index to time index (GH#1113)
* When passing multiple instances of a primitive built with
make_trans_primitive
ormaxe_agg_primitive
, those instances must have the same relative order when passed todfs
to ensure a consistent ordering of features.Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Breaking Changes¶
ft.dfs
will no longer build features from Transform primitives where one of the inputs is a Transform feature, a GroupByTransform feature, or a Direct Feature of a Transform / GroupByTransform feature. This will make some features that would previously be generated byft.dfs
only possible if explicitly specified inseed_features
.
v0.18.1 Aug 12, 2020¶
- Fixes
Fix
EntitySet.plot()
when given a dask entityset (GH#1086)
- Changes
Use
nlp-primitives[complete]
install fornlp_primitives
extra insetup.py
(GH#1103)
- Documentation Changes
Fix broken downloads badge in README.md (GH#1107)
- Testing Changes
Use CircleCI matrix jobs in config to trigger multiple runs of same job with different parameters (GH#1105)
Thanks to the following people for contributing to this release: @gsheni, @systemshift, @thehomebrewnerd
v0.18.0 Jul 31, 2020¶
- Enhancements
Warn user if supplied primitives are not used during dfs (GH#1073)
- Fixes
Use more consistent and uniform warnings (GH#1040)
Fix issue with missing instance ids and categorical entity index (GH#1050)
Remove warnings.simplefilter in feature_set_calculator to un-silence warnings (GH#1053)
Fix feature visualization for features with ‘>’ or ‘<’ in name (GH#1055)
Fix boolean dtype mismatch between encode_features and dfs and calculate_feature_matrix (GH#1082)
Update primitive options to check reversed inputs if primitive is commutative (GH#1085)
Fix inconsistent ordering of features between kernel restarts (GH#1088)
- Changes
Make DFS match
TimeSince
primitive with allDatetime
types (GH#1048)Change default branch to
main
(GH#1038)Raise TypeError if improper input is supplied to
Entity.delete_variables()
(GH#1064)Updates for compatibility with pandas 1.1.0 (GH#1079, GH#1089)
Set pandas version to pandas>=0.24.1,<2.0.0. Filter pandas deprecation warning in Week primitive. (GH#1094)
- Testing Changes
Add fixture for
ft.demo.load_mock_customer
(GH#1036)Refactor Dask test units (GH#1052)
Implement automated process for checking critical dependencies (GH#1045, GH#1054, GH#1081)
Don’t run changelog check for release PRs or automated dependency PRs (GH#1057)
Fix non-deterministic behavior in Dask test causing codecov issues (GH#1070)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @monti-python, @rwedge, @systemshift, @tamargrey, @thehomebrewnerd, @wsankey
v0.17.0 Jun 30, 2020¶
- Testing Changes
Add
required
flag to CircleCI codecov upload command (GH#1035)Thanks to the following people for contributing to this release: @frances-h, @gsheni, @kmax12, @rwedge, @thehomebrewnerd, @tuethan1999
Breaking Changes¶
Removed
Feature.get_names
,Feature.get_feature_names
should be used instead
v0.16.0 Jun 5, 2020¶
- Testing Changes
Update tests for numpy v1.19.0 compatability (GH#1016)
Thanks to the following people for contributing to this release: @Alex-Monahan, @frances-h, @gsheni, @rwedge, @thehomebrewnerd
v0.15.0 May 29, 2020¶
- Enhancements
Add
get_default_aggregation_primitives
andget_default_transform_primitives
(GH#945)Allow cutoff time dataframe columns to be in any order (GH#969, GH#995)
Add Age primitive, and make it a default transform primitive for DFS (GH#987)
Add
include_cutoff_time
arg - control whether data at cutoff times are included in feature calculations (GH#959)Allow
variables_types
to be referenced by theirtype_string
for theentity_from_dataframe
function (GH#988)
- Fixes
Fix errors with Equals and NotEquals primitives when comparing categoricals or different dtypes (GH#968)
Normalized type_strings of
Variable
classes so that thefind_variable_types
function produces a dictionary with a clear key to name transition (GH#982, GH#996)Remove pandas.datetime in test_calculate_feature_matrix due to deprecation (GH#998)
Thanks to the following people for contributing to this release: @ctduffy, @frances-h, @gsheni, @jeff-hernandez, @rightx2, @rwedge, @sebrahimi1988, @thehomebrewnerd, @tuethan1999
Breaking Changes¶
Calls to
featuretools.dfs
orfeaturetools.calculate_feature_matrix
that use a cutoff time dataframe, but do not label the time column with either the target entity time index variable name or astime
, will now result in anAttributeError
. Previously, the time column was selected to be the first column that was not the instance id column. With this update, the position of the column in the dataframe is no longer used to determine the time column. Now, both instance id columns and time columns in a cutoff time dataframe can be in any order as long as they are named properly.The
type_string
attributes of allVariable
subclasses are now a snake case conversion of their class names. This changes thetype_string
of theUnknown
,IPAddress
,EmailAddress
,SubRegionCode
,FilePath
,LatLong
, andZIPcode
classes. Old saved entitysets that used these variables may load incorrectly.
v0.14.0 Apr 30, 2020¶
- Enhancements
ft.encode_features - use less memory for one-hot encoded columns (GH#876)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge
Breaking Changes¶
Using training windows in feature calculations can result in different values than previous versions. This was done to prevent consecutive training windows from overlapping by excluding data at the oldest point in time. For example, if we use a cutoff time at the first minute of the hour with a one hour training window, the first minute of the previous hour will no longer be included in the feature calculation.
v0.13.4 Mar 27, 2020¶
Warning
The next non-bugfix release of Featuretools will not support Python 3.5
- Fixes
Fix ft.show_info() not displaying in Jupyter notebooks (GH#863)
- Testing Changes
Miscellaneous changes (GH#861)
Thanks to the following people for contributing to this release: @frances-h, @FreshLeaf8865, @jeff-hernandez, @rwedge, @thehomebrewnerd
v0.13.3 Feb 28, 2020¶
- Fixes
Fix a connection closed error when using n_jobs (GH#853)
- Changes
Pin msgpack dependency for Python 3.5; remove dataframe from Dask dependency (GH#851)
- Documentation Changes
Update link to help documentation page in Github issue template (GH#855)
Thanks to the following people for contributing to this release: @frances-h, @rwedge
v0.13.2 Jan 31, 2020¶
- Enhancements
Support for Pandas 1.0.0 (GH#844)
- Changes
Remove dependency on s3fs library for anonymous downloads from S3 (GH#825)
- Testing Changes
Added GitHub Action to automatically run performance tests (GH#840)
Thanks to the following people for contributing to this release: @frances-h, @rwedge
v0.13.1 Dec 28, 2019¶
Thanks to the following people for contributing to this release: @jeffzi, @kmax12, @rwedge, @systemshift
v0.13.0 Nov 30, 2019¶
- Enhancements
Added GitHub Action to auto upload releases to PyPI (GH#816)
- Fixes
Fix issue where some primitive options would not be applied (GH#807)
Fix issue with converting to pickle or parquet after adding interesting features (GH#798, GH#823)
Diff primitive now calculates using all available data (GH#824)
Prevent DFS from creating Identity Features of globally ignored variables (GH#819)
Thanks to the following people for contributing to this release: @frances-h, @jeff-hernandez, @rwedge, @systemshift
Breaking Changes¶
The libraries used for downloading or uploading from S3 or URLs are now optional and will no longer be installed by default. To use this functionality they will need to be installed separately.
The fix to how the Diff primitive is calculated may slow down the overall calculation time of feature lists that use this primitive.
v0.12.0 Oct 31, 2019¶
Thanks to the following people for contributing to this release: @ablacke-ayx, @BoopBoopBeepBoop, @jeffzi, @kmax12, @rwedge, @thehomebrewnerd, @twdobson
v0.11.0 Sep 30, 2019¶
Warning
The next non-bugfix release of Featuretools will not support Python 2
- Enhancements
Improve how files are copied and written (GH#721)
Add number of rows to graph in entityset.plot (GH#727)
Added support for pandas DateOffsets in DFS and Timedelta (GH#732)
Enable feature-specific top_n value using a dictionary in encode_features (GH#735)
Added progress_callback parameter to dfs() and calculate_feature_matrix() (GH#739, GH#745)
Enable specifying primitives on a per column or per entity basis (GH#748)
- Fixes
Fixed entity set deserialization (GH#720)
Added error message when DateTimeIndex is a variable but not set as the time_index (GH#723)
Fixed CumCount and other group-by transform primitives that take ID as input (GH#733, GH#754)
Fix progress bar undercounting (GH#743)
Updated training_window error assertion to only check against observations (GH#728)
Don’t delete the whole destination folder while saving entityset (GH#717)
- Changes
Raise warning and not error on schema version mismatch (GH#718)
Change feature calculation to return in order of instance ids provided (GH#676)
Removed time remaining from displayed progress bar in dfs() and calculate_feature_matrix() (GH#739)
Raise warning in normalize_entity() when time_index of base_entity has an invalid type (GH#749)
Remove toolz as a direct dependency (GH#755)
Allow boolean variable types to be used in the Multiply primitive (GH#756)
- Documentation Changes
Updated URL for Compose (GH#716)
Thanks to the following people for contributing to this release: @angela97lin, @chidauri, @christopherbunn, @frances-h, @jeff-hernandez, @kmax12, @MarcoGorelli, @rwedge, @thehomebrewnerd
Breaking Changes¶
Feature calculations will return in the order of instance ids provided instead of the order of time points instances are calculated at.
v0.10.1 Aug 25, 2019¶
v0.10.0 Aug 19, 2019¶
Warning
The next non-bugfix release of Featuretools will not support Python 2
- Enhancements
Give more frequent progress bar updates and update chunk size behavior (GH#631, GH#696)
Added drop_first as param in encode_features (GH#647)
Added support for stacking multi-output primitives (GH#679)
Generate transform features of direct features (GH#623)
Added serializing and deserializing from S3 and deserializing from URLs (GH#685)
Added nlp_primitives as an add-on library (GH#704)
Added AutoNormalize to Featuretools plugins (GH#699)
Added functionality for relative units (month/year) in Timedelta (GH#692)
Added categorical-encoding as an add-on library (GH#700)
- Fixes
Fix performance regression in DFS (GH#637)
Fix deserialization of feature relationship path (GH#665)
Set index after adding ancestor relationship variables (GH#668)
Fix user-supplied variable_types modification in Entity init (GH#675)
Don’t calculate dependencies of unnecessary features (GH#667)
Prevent normalize entity’s new entity having same index as base entity (GH#681)
Update variable type inference to better check for string values (GH#683)
- Changes
Moved dask, distributed imports (GH#634)
Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @ayushpatidar, @CJStadler, @ctduffy, @gsheni, @jeff-hernandez, @jeremyliweishih, @kmax12, @rwedge, @zhxt95,
v0.9.1 Jul 3, 2019¶
- Fixes
Select columns of dataframe using a list (GH#615)
Change type of features calculated on Index features to Categorical (GH#602)
Filter dataframes through forward relationships (GH#625)
Specify Dask version in requirements for python 2 (GH#627)
Keep dataframe sorted by time during feature calculation (GH#626)
Fix bug in encode_features that created duplicate columns of features with multiple outputs (GH#622)
Thanks to the following people for contributing to this release: @CJStadler, @kmax12, @rwedge, @gsheni, @kkleidal, @ctduffy
v0.9.0 Jun 19, 2019¶
- Enhancements
Add unit parameter to timesince primitives (GH#558)
Add ability to install optional add on libraries (GH#551)
Load and save features from open files and strings (GH#566)
Support custom variable types (GH#571)
Support entitysets which have multiple paths between two entities (GH#572, GH#544)
Added show_info function, more output information added to CLI featuretools info (GH#525)
- Fixes
Normalize_entity specifies error when ‘make_time_index’ is an invalid string (GH#550)
Schema version added for entityset serialization (GH#586)
Renamed features have names correctly serialized (GH#585)
Improved error message for index/time_index being the same column in normalize_entity and entity_from_dataframe (GH#583)
Removed unused variable in normalize entity (GH#589)
Change time since return type to numeric (GH#606)
Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @CJStadler, @ctduffy, @gsheni, @kmax12, @rwedge
v0.8.0 May 17, 2019¶
Rename NUnique to NumUnique (GH#510)
Serialize features as JSON (GH#532)
Drop all variables at once in normalize_entity (GH#533)
Remove unnecessary sorting from normalize_entity (GH#535)
Features cache their names (GH#536)
Only calculate features for instances before cutoff (GH#523)
Remove all relative imports (GH#530)
Added FullName Variable Type (GH#506)
Add error message when target entity does not exist (GH#520)
New demo links (GH#542)
Remove duplicate features check in DFS (GH#538)
featuretools_primitives entry point expects list of primitive classes (GH#529)
Update ALL_VARIABLE_TYPES list (GH#526)
More Informative N Jobs Prints and Warnings (GH#511)
Update sklearn version requirements (GH#541)
Update Makefile (GH#519)
Remove unused parameter in Entity._handle_time (GH#524)
Remove build_ext code from setup.py (GH#513)
Documentation updates (GH#512, GH#514, GH#515, GH#521, GH#522, GH#527, GH#545)
Thanks to the following people for contributing to this release: @bphi, @CharlesBradshaw, @CJStadler, @glentennis, @gsheni, @kmax12, @rwedge
Breaking Changes¶
NUnique
has been renamed toNumUnique
.Previous behavior
from featuretools.primitives import NUnique
New behavior
from featuretools.primitives import NumUnique
v0.7.1 Apr 24, 2019¶
Automatically generate feature name for controllable primitives (GH#481)
Primitive docstring updates (GH#489, GH#492, GH#494, GH#495)
Change primitive functions that returned strings to return functions (GH#499)
CLI customizable via entrypoints (GH#493)
Improve calculation of aggregation features on grandchildren (GH#479)
Refactor entrypoints to use decorator (GH#483)
Include doctests in testing suite (GH#491)
Documentation updates (GH#490)
Update how standard primitives are imported internally (GH#482)
Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @glentennis, @gsheni, @jeff-hernandez, @kmax12, @minkvsky, @rwedge, @thehomebrewnerd
v0.7.0 Mar 29, 2019¶
Improve Entity Set Serialization (GH#361)
Support calling a primitive instance’s function directly (GH#461, GH#468)
Support other libraries extending featuretools functionality via entrypoints (GH#452)
Remove featuretools install command (GH#475)
Add commutative argument to SubtractNumeric and DivideNumeric primitives (GH#457)
Add FilePath variable_type (GH#470)
Add PhoneNumber, DateOfBirth, URL variable types (GH#447)
Generalize infer_variable_type, convert_variable_data and convert_all_variable_data methods (GH#423)
Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @ColCarroll, @glentennis, @grayskripko, @gsheni, @jeff-hernandez, @jrkinley, @kmax12, @RogerTangos, @rwedge
Breaking Changes¶
ft.dfs
now has agroupby_trans_primitives
parameter that DFS uses to automatically construct features that group by an ID column and then apply a transform primitive to search group. This change applies to the following primitives:CumSum
,CumCount
,CumMean
,CumMin
, andCumMax
.Previous behavior
ft.dfs(entityset=es, target_entity='customers', trans_primitives=["cum_mean"])
New behavior
ft.dfs(entityset=es, target_entity='customers', groupby_trans_primitives=["cum_mean"])
Related to the above change, cumulative transform features are now defined using a new feature class,
GroupByTransformFeature
.Previous behavior
ft.Feature([base_feature, groupby_feature], primitive=CumulativePrimitive)
New behavior
ft.Feature(base_feature, groupby=groupby_feature, primitive=CumulativePrimitive)
v0.6.1 Feb 15, 2019¶
Cumulative primitives (GH#410)
Entity.query_by_values now preserves row order of underlying data (GH#428)
Implementing Country Code and Sub Region Codes as variable types (GH#430)
Added IPAddress and EmailAddress variable types (GH#426)
Install data and dependencies (GH#403)
Add TimeSinceFirst, fix TimeSinceLast (GH#388)
Allow user to pass in desired feature return types (GH#372)
Add new configuration object (GH#401)
Replace NUnique get_function (GH#434)
_calculate_idenity_features now only returns the features asked for, instead of the entire entity (GH#429)
Primitive function name uniqueness (GH#424)
Update NumCharacters and NumWords primitives (GH#419)
Change to zipcode rep, str for pandas (GH#418)
Remove pandas version upper bound (GH#408)
Make S3 dependencies optional (GH#404)
Check that agg_primitives and trans_primitives are right primitive type (GH#397)
Mean primitive changes (GH#395)
Fix transform stacking on multi-output aggregation (GH#394)
Fix list_primitives (GH#391)
Documentation updates (GH#400, GH#409, GH#415, GH#417, GH#420, GH#421, GH#422, GH#431)
Thanks to the following people for contributing to this release: @CharlesBradshaw, @csala, @floscha, @gsheni, @jxwolstenholme, @kmax12, @RogerTangos, @rwedge
v0.6.0 Jan 30, 2018¶
Primitive refactor (GH#364)
Mean ignore NaNs (GH#379)
Plotting entitysets (GH#382)
Add seed features later in DFS process (GH#357)
Multiple output column features (GH#376)
Add ZipCode Variable Type (GH#367)
Add primitive.get_filepath and example of primitive loading data from external files (GH#380)
Transform primitives take series as input (GH#385)
Add modulo to override tests (GH#384)
Thanks to the following people for contributing to this release: @floscha, @gsheni, @kmax12, @RogerTangos, @rwedge
v0.5.1 Dec 17, 2018¶
v0.5.0 Dec 17, 2018¶
Add specific error for duplicate additional/copy_variables in normalize_entity (GH#348)
Removed EntitySet._import_from_dataframe (GH#346)
Removed time_index_reduce parameter (GH#344)
Allow installation of additional primitives (GH#326)
Fix DatetimeIndex variable conversion (GH#342)
Update Sklearn DFS Transformer (GH#343)
Clean up entity creation logic (GH#336)
remove casting to list in transform feature calculation (GH#330)
Fix sklearn wrapper (GH#335)
Add readme to pypi
Update conda docs after move to conda-forge (GH#334)
Add wrapper for scikit-learn Pipelines (GH#323)
Remove parse_date_cols parameter from EntitySet._import_from_dataframe (GH#333)
Thanks to the following people for contributing to this release: @bukosabino, @georgewambold, @gsheni, @jeff-hernandez, @kmax12, and @rwedge.
v0.4.1 Nov 29, 2018¶
Resolve bug preventing using first column as index by default (GH#308)
Handle return type when creating features from Id variables (GH#318)
Make id an optional parameter of EntitySet constructor (GH#324)
Handle primitives with same function being applied to same column (GH#321)
Update requirements (GH#328)
Clean up DFS arguments (GH#319)
Clean up Pandas Backend (GH#302)
Update properties of cumulative transform primitives (GH#320)
Feature stability between versions documentation (GH#316)
Add download count to GitHub readme (GH#310)
Fixed #297 update tests to check error strings (GH#303)
Remove usage of fixtures in agg primitive tests (GH#325)
v0.4.0 Oct 31, 2018¶
Remove ft.utils.gen_utils.getsize and make pympler a test requirement (GH#299)
Update requirements.txt (GH#298)
Refactor EntitySet.find_path(…) (GH#295)
Clean up unused methods (GH#293)
Remove unused parents property of Entity (GH#283)
Removed relationships parameter (GH#284)
Improve time index validation (GH#285)
Encode features with “unknown” class in categorical (GH#287)
Allow where clauses on direct features in Deep Feature Synthesis (GH#279)
Change to fullargsspec (GH#288)
Parallel verbose fixes (GH#282)
Update tests for python 3.7 (GH#277)
Check duplicate rows cutoff times (GH#276)
Load retail demo data using compressed file (GH#271)
v0.3.1 Sep 28, 2018¶
Handling time rewrite (GH#245)
Update deep_feature_synthesis.py (GH#249)
Handling return type when creating features from DatetimeTimeIndex (GH#266)
Update retail.py (GH#259)
Improve Consistency of Transform Primitives (GH#236)
Update demo docstrings (GH#268)
Handle non-string column names (GH#255)
Clean up merging of aggregation primitives (GH#250)
Add tests for Entity methods (GH#262)
Handle no child data when calculating aggregation features with multiple arguments (GH#264)
Add is_string utils function (GH#260)
Update python versions to match docker container (GH#261)
Handle where clause when no child data (GH#258)
No longer cache demo csvs, remove config file (GH#257)
Avoid stacking “expanding” primitives (GH#238)
Use randomly generated names in retail csv (GH#233)
Update README.md (GH#243)
v0.3.0 Aug 27, 2018¶
Improve performance of all feature calculations (GH#224)
Update agg primitives to use more efficient functions (GH#215)
Optimize metadata calculation (GH#229)
More robust handling when no data at a cutoff time (GH#234)
Workaround categorical merge (GH#231)
Switch which CSV is associated with which variable (GH#228)
Remove unused kwargs from query_by_values, filter_and_sort (GH#225)
Remove convert_links_to_integers (GH#219)
Add example of using Dask to parallelize to docs (GH#221)
v0.2.2 Aug 20, 2018¶
Remove unnecessary check no related instances call and refactor (GH#209)
Improve memory usage through support for pandas categorical types (GH#196)
Bump minimum pandas version from 0.20.3 to 0.23.0 (GH#216)
Make primitive lookup case insensitive (GH#213)
Use capital name (GH#211)
Set class name for Min (GH#206)
Remove
variable_types
from normalize entity (GH#205)Handle parquet serialization with last time index (GH#204)
Reset index of cutoff times in calculate feature matrix (GH#198)
Check argument types for .normalize_entity (GH#195)
Type checking ignore entities. (GH#193)
v0.2.1 Jul 2, 2018¶
v0.2.0 Jun 22, 2018¶
Multiprocessing (GH#170)
Handle unicode encoding in repr throughout Featuretools (GH#161)
Clean up EntitySet class (GH#145)
Add support for building and uploading conda package (GH#167)
Parquet serialization (GH#152)
Remove variable stats (GH#171)
Make sure index variable comes first (GH#168)
No last time index update on normalize (GH#169)
Remove list of times as on option for cutoff_time in calculate_feature_matrix (GH#165)
Config does error checking to see if it can write to disk (GH#162)
v0.1.21 May 30, 2018¶
No EntitySet required in loading/saving features (GH#141)
Use s3 demo csv with better column names (GH#139)
more reasonable start parameter (GH#149)
add issue template (GH#133)
Update documentation after recent changes / removals (GH#157)
Rename demo retail csv file (GH#148)
Add names for binary (GH#142)
EntitySet repr to use get_name rather than id (GH#134)
Ensure config dir is writable (GH#135)
v0.1.20 Apr 13, 2018¶
Primitives as strings in DFS parameters (GH#129)
Integer time index bugfixes (GH#128)
Add make_temporal_cutoffs utility function (GH#126)
Show all entities, switch shape display to row/col (GH#124)
Improved chunking when calculating feature matrices (GH#121)
fixed num characters nan fix (GH#118)
modify ignore_variables docstring (GH#117)
v0.1.19 Mar 21, 2018¶
More descriptive DFS progress bar (GH#69)
Convert text variable to string before NumWords (GH#106)
EntitySet.concat() reindexes relationships (GH#96)
Keep non-feature columns when encoding feature matrix (GH#111)
Uses full entity update for dependencies of uses_full_entity features (GH#110)
Update column names in retail demo (GH#104)
Handle Transform features that need access to all values of entity (GH#91)
v0.1.18 Feb 27, 2018¶
v0.1.17 Jan 18, 2018¶
LatLong type (GH#57)
Last time index fixes (GH#70)
Make median agg primitives ignore nans by default (GH#61)
Remove Python 3.4 support (GH#64)
Change normalize_entity to update secondary_time_index (GH#59)
Unpin requirements (GH#53)
associative -> commutative (GH#56)
Add Words and Chars primitives (GH#51)
v0.1.16 Dec 19, 2017¶
v0.1.15 Dec 18, 2017¶
v0.1.14 Nov 20, 2017¶
v0.1.13 Nov 1, 2017¶
Add MANIFEST.in (GH#26)
v0.1.11 Oct 31, 2017¶
Package linting (GH#7)
Custom primitive creation functions (GH#13)
Split requirements to separate files and pin to latest versions (GH#15)
Select low information features (GH#18)
Fix docs typos (GH#19)
Fixed Diff primitive for rare nan case (GH#21)
added some mising doc strings (GH#23)
Trend fix (GH#22)
Remove as_dir=False option from EntitySet.to_pickle() (GH#20)
Entity Normalization Preserves Types of Copy & Additional Variables (GH#25)
v0.1.10 Oct 12, 2017¶
NumTrue primitive added and docstring of other primitives updated (GH#11)
fixed hash issue with same base features (GH#8)
Head fix (GH#9)
Fix training window (GH#10)
Add associative attribute to primitives (GH#3)
Add status badges, fix license in setup.py (GH#1)
fixed head printout and flight demo index (GH#2)
v0.1.9 Sep 8, 2017¶
Documentation improvements
New
featuretools.demo.load_mock_customer
function
v0.1.8 Sep 1, 2017¶
Bug fixes
Added
Percentile
transform primitive
v0.1.7 Aug 17, 2017¶
Performance improvements for approximate in
calculate_feature_matrix
anddfs
Added
Week
transform primitive
v0.1.6 Jul 26, 2017¶
Added
load_features
andsave_features
to persist and reload featuresAdded save_progress argument to
calculate_feature_matrix
Added approximate parameter to
calculate_feature_matrix
anddfs
Added
load_flight
to ft.demo
v0.1.5 Jul 11, 2017¶
Windows support
v0.1.3 Jul 10, 2017¶
Renamed feature submodule to primitives
Renamed prediction_entity arguments to target_entity
Added training_window parameter to
calculate_feature_matrix
v0.1.2 Jul 3rd, 2017¶
Initial release