NOTICE
The upcoming release of Featuretools 1.0.0 contains several breaking changes. Users are encouraged to test this version prior to release by installing from GitHub:
pip install https://github.com/alteryx/featuretools/archive/woodwork-integration.zip
For details on migrating to the new version, refer to Transitioning to Featuretools Version 1.0. Please report any issues in the Featuretools GitHub repo or by messaging in Alteryx Open Source Slack.
Enhancements Add support for creating EntitySets from Woodwork DataTables (GH#1277) Add EntitySet.__deepcopy__ that retains Woodwork typing information (GH#1465) Add EntitySet.__getstate__ and EntitySet.__setstate__ to preserve typing when pickling (GH#1581) Returned feature matrix has woodwork typing information (GH#1664) Fixes Fix DFSTransformer Documentation for Featuretools 1.0 (GH#1605) Fix calculate_feature_matrix time type check and encode_features for synthesis tests (GH#1580) Revert reordering of categories in Equal and NotEqual primitives (GH#1640) Fix bug in EntitySet.add_relationship that caused foreign_key tag to be lost (GH#1675) Update DFS to not build features on last time index columns in dataframes (GH#1695) Changes Remove add_interesting_values from Entity (GH#1269) Move set_secondary_time_index method from Entity to EntitySet (GH#1280) Refactor Relationship creation process (GH#1370) Replaced Entity.update_data with EntitySet.update_dataframe (GH#1398) Move validation check for uniform time index to EntitySet (GH#1400) Replace Entity objects in EntitySet with Woodwork dataframes (GH#1405) Refactor EntitySet.plot to work with Woodwork dataframes (GH#1468) Move last_time_index to be a column on the DataFrame (GH#1456) Update serialization/deserialization to work with Woodwork (GH#1452) Refactor EntitySet.query_by_values to work with Woodwork dataframes (GH#1467) Replace list_variable_types with list_logical_types (GH#1477) Allow deep EntitySet equality check (GH#1480) Update EntitySet.concat to work with Woodwork DataFrames (GH#1490) Add function to list semantic tags (GH#1486) Initialize Woodwork on feature matrix in remove_highly_correlated_features if necessary (GH#1618) Remove categorical-encoding as an add-on library (will be added back later) (GH#1632) Remove autonormalize as an add-on library (will be added back later) (GH#1636) Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (GH#1638) Update input and return types for CumCount primitive (GH#1651) Standardize imports of Woodwork (GH#1526) Rename target entity to target dataframe (GH#1506) Replace entity_from_dataframe with add_dataframe (GH#1504) Create features from Woodwork columns (GH#1582) Move default variable description logic to generate_description (GH#1403) Update Woodwork to version 0.4.0 with LogicalType.transform and LogicalType instances (GH#1451) Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (GH#1478) Use ColumnSchema for primitive input and return tyes (GH#1411) Update features to use Woodwork and remove Entity and Variable classes (GH#1501) Re-add make_index functionality to EntitySet (GH#1507) Use ColumnSchema in DFS primitive matching (GH#1523) Updates from Featuretools v0.26.0 (GH#1539) Leverage Woodwork better in add_interesting_values (GH#1550) Update calculate_feature_matrix to use Woodwork (GH#1533) Update Woodwork to version 0.6.0 with changed categorical inference (GH#1597) Update nlp-primitives requirement for Featuretools 1.0 (GH#1609) Remove remaining references to Entity and Variable in code (GH#1612) Update Woodwork to version 0.7.1 with changed initialization (GH#1648) Removes outdated workaround code related to a since-resolved pandas issue (GH#1677) Remove unused _dataframes_equal and camel_to_snake functions (GH#1683) Update Woodwork to version 0.8.0 for improved performance (GH#1689) Remove redundant typecasting in encode_features (GH#1694) Speed up encode_features if not inplace, some space cost (GH#1699) Clean up comments and commented out code (GH#1701) Update Woodwork to version 0.8.1 for improved performance (GH#1702) Documentation Changes Add a Woodwork Typing in Featuretools guide (GH#1589) Add a resource guide for transitioning to Featuretools 1.0 (GH#1627) Update using_entitysets page to use Woodwork (GH#1532) Update FAQ page to use Woodwork integration (GH#1649) Update DFS page to be Jupyter notebook and use Woodwork integration (GH#1557) Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (GH#1556) Update Handling Time page to be Jupyter notebook and use Woodwork integration (GH#1552) Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (GH#1587) Update Deployment page to use Woodwork integration (GH#1588) Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1590) Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (GH#1593) Update API Reference to match Featuretools 1.0 API (GH#1600) Update Index page to be Jupyter notebook and use Woodwork integration (GH#1602) Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (GH#1603) Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1604) Update Glossary to use Woodwork integration (GH#1608) Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (GH#1610) Fix small formatting issues in Documentation (GH#1607) Remove Variables page and more references to variables (GH#1629) Update Feature Selection page to use Woodwork integration (GH#1618) Update Improving Performance page to be Jupyter notebook and use Woodwork integration (GH#1591) Fix typos in transition guide (GH#1672) Testing Changes Remove entity tests (GH#1521) Fix broken EntitySet tests (GH#1548) Fix broken primitive tests (GH#1568) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Add support for creating EntitySets from Woodwork DataTables (GH#1277)
Add EntitySet.__deepcopy__ that retains Woodwork typing information (GH#1465)
EntitySet.__deepcopy__
Add EntitySet.__getstate__ and EntitySet.__setstate__ to preserve typing when pickling (GH#1581)
EntitySet.__getstate__
EntitySet.__setstate__
Returned feature matrix has woodwork typing information (GH#1664)
Fix DFSTransformer Documentation for Featuretools 1.0 (GH#1605)
DFSTransformer
Fix calculate_feature_matrix time type check and encode_features for synthesis tests (GH#1580)
calculate_feature_matrix
encode_features
Revert reordering of categories in Equal and NotEqual primitives (GH#1640)
Equal
NotEqual
Fix bug in EntitySet.add_relationship that caused foreign_key tag to be lost (GH#1675)
EntitySet.add_relationship
foreign_key
Update DFS to not build features on last time index columns in dataframes (GH#1695)
Remove add_interesting_values from Entity (GH#1269)
add_interesting_values
Entity
Move set_secondary_time_index method from Entity to EntitySet (GH#1280)
set_secondary_time_index
EntitySet
Refactor Relationship creation process (GH#1370)
Replaced Entity.update_data with EntitySet.update_dataframe (GH#1398)
Entity.update_data
EntitySet.update_dataframe
Move validation check for uniform time index to EntitySet (GH#1400)
Replace Entity objects in EntitySet with Woodwork dataframes (GH#1405)
Refactor EntitySet.plot to work with Woodwork dataframes (GH#1468)
EntitySet.plot
Move last_time_index to be a column on the DataFrame (GH#1456)
last_time_index
Update serialization/deserialization to work with Woodwork (GH#1452)
Refactor EntitySet.query_by_values to work with Woodwork dataframes (GH#1467)
EntitySet.query_by_values
Replace list_variable_types with list_logical_types (GH#1477)
list_variable_types
list_logical_types
Allow deep EntitySet equality check (GH#1480)
Update EntitySet.concat to work with Woodwork DataFrames (GH#1490)
EntitySet.concat
Add function to list semantic tags (GH#1486)
Initialize Woodwork on feature matrix in remove_highly_correlated_features if necessary (GH#1618)
remove_highly_correlated_features
Remove categorical-encoding as an add-on library (will be added back later) (GH#1632)
Remove autonormalize as an add-on library (will be added back later) (GH#1636)
Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (GH#1638)
Update input and return types for CumCount primitive (GH#1651)
CumCount
Standardize imports of Woodwork (GH#1526)
Rename target entity to target dataframe (GH#1506)
Replace entity_from_dataframe with add_dataframe (GH#1504)
entity_from_dataframe
add_dataframe
Create features from Woodwork columns (GH#1582)
Move default variable description logic to generate_description (GH#1403)
generate_description
Update Woodwork to version 0.4.0 with LogicalType.transform and LogicalType instances (GH#1451)
LogicalType.transform
Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (GH#1478)
Use ColumnSchema for primitive input and return tyes (GH#1411)
ColumnSchema
Update features to use Woodwork and remove Entity and Variable classes (GH#1501)
Variable
Re-add make_index functionality to EntitySet (GH#1507)
make_index
Use ColumnSchema in DFS primitive matching (GH#1523)
Updates from Featuretools v0.26.0 (GH#1539)
Leverage Woodwork better in add_interesting_values (GH#1550)
Update calculate_feature_matrix to use Woodwork (GH#1533)
Update Woodwork to version 0.6.0 with changed categorical inference (GH#1597)
Update nlp-primitives requirement for Featuretools 1.0 (GH#1609)
nlp-primitives
Remove remaining references to Entity and Variable in code (GH#1612)
Update Woodwork to version 0.7.1 with changed initialization (GH#1648)
Removes outdated workaround code related to a since-resolved pandas issue (GH#1677)
Remove unused _dataframes_equal and camel_to_snake functions (GH#1683)
_dataframes_equal
camel_to_snake
Update Woodwork to version 0.8.0 for improved performance (GH#1689)
Remove redundant typecasting in encode_features (GH#1694)
Speed up encode_features if not inplace, some space cost (GH#1699)
Clean up comments and commented out code (GH#1701)
Update Woodwork to version 0.8.1 for improved performance (GH#1702)
Add a Woodwork Typing in Featuretools guide (GH#1589)
Add a resource guide for transitioning to Featuretools 1.0 (GH#1627)
Update using_entitysets page to use Woodwork (GH#1532)
using_entitysets
Update FAQ page to use Woodwork integration (GH#1649)
Update DFS page to be Jupyter notebook and use Woodwork integration (GH#1557)
Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (GH#1556)
Update Handling Time page to be Jupyter notebook and use Woodwork integration (GH#1552)
Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (GH#1587)
Update Deployment page to use Woodwork integration (GH#1588)
Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1590)
Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (GH#1593)
Update API Reference to match Featuretools 1.0 API (GH#1600)
Update Index page to be Jupyter notebook and use Woodwork integration (GH#1602)
Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (GH#1603)
Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (GH#1604)
Update Glossary to use Woodwork integration (GH#1608)
Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (GH#1610)
Fix small formatting issues in Documentation (GH#1607)
Remove Variables page and more references to variables (GH#1629)
Update Feature Selection page to use Woodwork integration (GH#1618)
Update Improving Performance page to be Jupyter notebook and use Woodwork integration (GH#1591)
Fix typos in transition guide (GH#1672)
Remove entity tests (GH#1521)
Fix broken EntitySet tests (GH#1548)
Fix broken primitive tests (GH#1568)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Entity.add_interesting_values has been removed. To add interesting values for a single entity, call EntitySet.add_interesting_values and pass the name of the dataframe for which to add interesting values in the dataframe_name parameter (GH#1405, GH#1370).
Entity.add_interesting_values
EntitySet.add_interesting_values
dataframe_name
Entity.set_secondary_time_index has been removed and replaced by EntitySet.set_secondary_time_index with an added dataframe_name parameter to specify the dataframe on which to set the secondary time index (GH#1405, GH#1370).
Entity.set_secondary_time_index
EntitySet.set_secondary_time_index
Relationship initialization has been updated to accept four name values for the parent dataframe, parent column, child dataframe and child column instead of accepting two Variable objects (GH#1405, GH#1370).
Relationship
EntitySet.add_relationship has been updated to accept dataframe and column name values or a Relationship object. Adding a relationship from a Relationship object now requires passing the relationship as a keyword argument (GH#1405, GH#1370).
Entity.update_data has been removed. To update the dataframe, call EntitySet.replace_dataframe and use the dataframe_name parameter (GH#1630, GH#1522).
EntitySet.replace_dataframe
The data in an EntitySet is no longer stored in Entity objects. Instead, dataframes with Woodwork typing information are used. Accordingly, most language referring to “entities” will now refer to “dataframes”, references to “variables” will now refer to “columns”, and “variable types” will use the Woodwork type system’s “logical types” and “semantic tags” (GH#1405).
The dictionary of tuples passed to EntitySet.__init__ has replaced the variable_types element with separate logical_types and semantic_tags dictionaries (GH#1405).
EntitySet.__init__
variable_types
logical_types
semantic_tags
EntitySet.entity_from_dataframe no longer exists. To add new tables to an entityset, use``EntitySet.add_dataframe`` (GH#1405).
EntitySet.entity_from_dataframe
EntitySet.normalize_entity has been renamed to EntitySet.normalize_dataframe (GH#1405).
EntitySet.normalize_entity
EntitySet.normalize_dataframe
Instead of raising an error at EntitySet.add_relationship when the dtypes of parent and child columns do not match, Featuretools will now check whether the Woodwork logical type of the parent and child columns match. If they do not match, there will now be a warning raised, and Featuretools will attempt to update the logical type of the child column to match the parent’s (GH#1405).
If no index is specified at EntitySet.add_dataframe, the first column will only be used as index if Woodwork has not been initialized on the DataFrame. When adding a dataframe that already has Woodwork initialized, if there is no index set, an error will be raised (GH#1405).
EntitySet.add_dataframe
Featuretools will no longer re-order columns in DataFrames so that the index column is the first column of the DataFrame (GH#1405).
Type inference can now be performed on Dask and Koalas dataframes, though a warning will be issued indicating that this may be computationally intensive (GH#1405).
EntitySet.time_type is no longer stored as Variable objects. Instead, Woodwork typing is used, and a numeric time type will be indicated by the 'numeric' semantic tag string, and a datetime time type will be indicated by the Datetime logical type (GH#1405).
'numeric'
Datetime
last_time_index, secondary_time_index, and interesting_values are no longer attributes of an entityset’s tables that can be accessed directly. Now they must be accessed through the metadata of the Woodwork DataFrame, which is a dictionary (GH#1405).
secondary_time_index
interesting_values
The helper function list_variable_types will be removed in a future release and replaced by list_logical_types. In the meantime, list_variable_types will return the same output as list_logical_types (GH#1447).
Adding Interesting Values
To add interesting values for a single entity, call EntitySet.add_interesting_values passing the id of the entity for which interesting values should be added.
>>> es.add_interesting_values(entity_id='log')
Setting a Secondary Time Index
To set a secondary time index for a specific entity, call EntitySet.set_secondary_time_index passing Entity for which to set the secondary time index along with the dictionary mapping the secondary time index column to the for which the secondary time index applies.
>>> customers_secondary_time_index = {'cancel_date': ['cancel_reason']} >>> es.set_secondary_time_index(es['customers'], customers_secondary_time_index)
Creating a Relationship and Adding to an EntitySet
Relationships are now created by passing parameters identifying the entityset along with four string values specifying the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional.
>>> new_relationship = Relationship( ... entityset=es, ... parent_dataframe_name='customers', ... parent_column_name='id', ... child_dataframe_name='sessions', ... child_column_name='customer_id' ... )
Relationships can now be added to EntitySets in one of two ways. The first approach is to pass in name values for the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional with this approach.
>>> es.add_relationship( ... parent_dataframe_name='customers', ... parent_column_name='id', ... child_dataframe_name='sessions', ... child_column_name='customer_id' ... )
Relationships can also be added by passing in a previously created Relationship object. When using this approach the relationship parameter name must be included.
relationship
>>> es.add_relationship(relationship=new_relationship)
Replace DataFrame
To replace a dataframe in an EntitySet with a new dataframe, call EntitySet.replace_dataframe and pass in the name of the dataframe to replace along with the new data.
>>> es.replace_dataframe(dataframe_name='log', df=df)
List Logical Types and Semantic Tags
Logical types and semantic tags have replaced variable types to parse and interpret columns. You can list all the available logical types by calling featuretools.list_logical_types.
featuretools.list_logical_types
>>> ft.list_logical_types()
You can list all the available semantic tags by calling featuretools.list_semantic_tags.
featuretools.list_semantic_tags
>>> ft.list_semantic_tags()
Documentation Changes Add banner to docs about upcoming Featuretools 1.0 release (GH#1669) Thanks to the following people for contributing to this release: @thehomebrewnerd
Add banner to docs about upcoming Featuretools 1.0 release (GH#1669)
Thanks to the following people for contributing to this release: @thehomebrewnerd
Changes Remove autonormalize, tsfresh, nlp_primitives, sklearn_transformer, caegorical_encoding as an add-on libraries (will be added back later) (GH#1644) Emit a warning message when a featuretools_primitives entrypoint throws an exception (GH#1662) Throw a RuntimeError when two primitives with the same name are encountered during featuretools_primitives entrypoint handling (GH#1662) Prevent the featuretools_primitives entrypoint loader from loading non-class objects as well as the AggregationPrimitive and TransformPrimitive base classes (GH#1662) Testing Changes Update latest dependency checker with proper install command (GH#1652) Update isort dependency (GH#1654) Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge
Remove autonormalize, tsfresh, nlp_primitives, sklearn_transformer, caegorical_encoding as an add-on libraries (will be added back later) (GH#1644)
Emit a warning message when a featuretools_primitives entrypoint throws an exception (GH#1662)
featuretools_primitives
Throw a RuntimeError when two primitives with the same name are encountered during featuretools_primitives entrypoint handling (GH#1662)
RuntimeError
Prevent the featuretools_primitives entrypoint loader from loading non-class objects as well as the AggregationPrimitive and TransformPrimitive base classes (GH#1662)
AggregationPrimitive
TransformPrimitive
Update latest dependency checker with proper install command (GH#1652)
Update isort dependency (GH#1654)
Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge
Documentation Changes Specify conda channel and Windows exe in graphviz installation instructions (GH#1611) Remove GA token from the layout html (GH#1622) Testing Changes Add additional reviewers to minimum and latest dependency checkers (GH#1558, GH#1562, GH#1564, GH#1567) Thanks to the following people for contributing to this release: @gsheni, @simha104
Specify conda channel and Windows exe in graphviz installation instructions (GH#1611)
Remove GA token from the layout html (GH#1622)
Add additional reviewers to minimum and latest dependency checkers (GH#1558, GH#1562, GH#1564, GH#1567)
Thanks to the following people for contributing to this release: @gsheni, @simha104
Fixes Set name attribute for EmailAddressToDomain primitive (GH#1543) Documentation Changes Remove and ignore unnecessary graph files (GH#1544) Thanks to the following people for contributing to this release: @davesque, @rwedge
Set name attribute for EmailAddressToDomain primitive (GH#1543)
name
EmailAddressToDomain
Remove and ignore unnecessary graph files (GH#1544)
Thanks to the following people for contributing to this release: @davesque, @rwedge
Enhancements Add replace_inf_values utility function for replacing inf values in a feature matrix (GH#1505) Add URLToProtocol, URLToDomain, URLToTLD, EmailAddressToDomain, IsFreeEmailDomain as transform primitives (GH#1508, GH#1531) Fixes include_entities correctly overrides exclude_entities in primitive_options (GH#1518) Documentation Changes Prevent logging on build (GH#1498) Testing Changes Test featuretools on pandas 1.3.0 release candidate and make fixes (GH#1492) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Add replace_inf_values utility function for replacing inf values in a feature matrix (GH#1505)
replace_inf_values
inf
Add URLToProtocol, URLToDomain, URLToTLD, EmailAddressToDomain, IsFreeEmailDomain as transform primitives (GH#1508, GH#1531)
include_entities correctly overrides exclude_entities in primitive_options (GH#1518)
include_entities
exclude_entities
primitive_options
Prevent logging on build (GH#1498)
Test featuretools on pandas 1.3.0 release candidate and make fixes (GH#1492)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Enhancements Add get_valid_primitives function (GH#1462) Add EntitySet.dataframe_type attribute (GH#1473) Changes Upgrade minimum alteryx open source update checker to 2.0.0 (GH#1460) Testing Changes Upgrade minimum pip requirement for testing to 21.1.2 (GH#1475) Thanks to the following people for contributing to this release: @gsheni, @rwedge
Add get_valid_primitives function (GH#1462)
get_valid_primitives
Add EntitySet.dataframe_type attribute (GH#1473)
EntitySet.dataframe_type
Upgrade minimum alteryx open source update checker to 2.0.0 (GH#1460)
Upgrade minimum pip requirement for testing to 21.1.2 (GH#1475)
Thanks to the following people for contributing to this release: @gsheni, @rwedge
Fixes Update minimum pyyaml requirement to 5.4 (GH#1433) Update minimum psutil requirement to 5.6.6 (GH#1438) Documentation Changes Update nbsphinx version to fix docs build issue (GH#1436) Testing Changes Create separate worksflows for each CI job (GH#1422) Add minimum dependency checker to generate minimum requirement files (GH#1428) Add unit tests against minimum dependencies for python 3.7 on PRs and main (GH#1432, GH#1445) Update minimum urllib3 requirement to 1.26.5 (GH#1457) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @thehomebrewnerd
Update minimum pyyaml requirement to 5.4 (GH#1433)
Update minimum psutil requirement to 5.6.6 (GH#1438)
Update nbsphinx version to fix docs build issue (GH#1436)
Create separate worksflows for each CI job (GH#1422)
Add minimum dependency checker to generate minimum requirement files (GH#1428)
Add unit tests against minimum dependencies for python 3.7 on PRs and main (GH#1432, GH#1445)
Update minimum urllib3 requirement to 1.26.5 (GH#1457)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @thehomebrewnerd
Changes Add auto assign bot on GitHub (GH#1380) Reduce DFS max_depth to 1 if single entity in entityset (GH#1412) Drop Python 3.6 support (GH#1413) Documentation Changes Improve formatting of release notes (GH#1396) Testing Changes Update Dask/Koalas test fixtures (GH#1382) Update Spark config in test fixtures and docs (GH#1387, GH#1389) Don’t cancel other CI jobs if one fails (GH#1386) Update boto3 and urllib3 version requirements (GH#1394) Update token for dependency checker PR creation (GH#1402, GH#1407, GH#1409) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Add auto assign bot on GitHub (GH#1380)
Reduce DFS max_depth to 1 if single entity in entityset (GH#1412)
Drop Python 3.6 support (GH#1413)
Improve formatting of release notes (GH#1396)
Update Dask/Koalas test fixtures (GH#1382)
Update Spark config in test fixtures and docs (GH#1387, GH#1389)
Don’t cancel other CI jobs if one fails (GH#1386)
Update boto3 and urllib3 version requirements (GH#1394)
Update token for dependency checker PR creation (GH#1402, GH#1407, GH#1409)
Warning The next non-bugfix release of Featuretools will not support Python 3.6 Changes Minor updates to work with Koalas version 1.7.0 (GH#1351) Explicitly mention Python 3.8 support in setup.py classifiers (GH#1371) Fix issue with smart-open version 5.0.0 (GH#1372, GH#1376) Testing Changes Make release notes updated check separate from unit tests (GH#1347) Performance tests now specify which commit to check (GH#1354) Thanks to the following people for contributing to this release: @gsheni, @rwedge, @thehomebrewnerd
Warning
The next non-bugfix release of Featuretools will not support Python 3.6
Minor updates to work with Koalas version 1.7.0 (GH#1351)
Explicitly mention Python 3.8 support in setup.py classifiers (GH#1371)
Fix issue with smart-open version 5.0.0 (GH#1372, GH#1376)
Make release notes updated check separate from unit tests (GH#1347)
Performance tests now specify which commit to check (GH#1354)
Thanks to the following people for contributing to this release: @gsheni, @rwedge, @thehomebrewnerd
Warning The next non-bugfix release of Featuretools will not support Python 3.6 Enhancements The list_primitives function returns valid input types and the return type (GH#1341) Fixes Restrict numpy version when installing koalas (GH#1329) Changes Warn python 3.6 users support will be dropped in future release (GH#1344) Documentation Changes Update docs for defining custom primitives (GH#1332) Update featuretools release instructions (GH#1345) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge
The list_primitives function returns valid input types and the return type (GH#1341)
list_primitives
Restrict numpy version when installing koalas (GH#1329)
Warn python 3.6 users support will be dropped in future release (GH#1344)
Update docs for defining custom primitives (GH#1332)
Update featuretools release instructions (GH#1345)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge
Fixes Calculate direct features uses default value if parent missing (GH#1312) Fix bug and improve tests for EntitySet.__eq__ and Entity.__eq__ (GH#1323) Documentation Changes Update Twitter link to documentation toolbar (GH#1322) Testing Changes Unpin python-graphviz package on Windows (GH#1296) Reorganize and clean up tests (GH#1294, GH#1303, GH#1306) Trigger tests on pull request events (GH#1304, GH#1315) Remove unnecessary test skips on Windows (GH#1320) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @seriallazer, @thehomebrewnerd
Calculate direct features uses default value if parent missing (GH#1312)
Fix bug and improve tests for EntitySet.__eq__ and Entity.__eq__ (GH#1323)
EntitySet.__eq__
Entity.__eq__
Update Twitter link to documentation toolbar (GH#1322)
Unpin python-graphviz package on Windows (GH#1296)
Reorganize and clean up tests (GH#1294, GH#1303, GH#1306)
Trigger tests on pull request events (GH#1304, GH#1315)
Remove unnecessary test skips on Windows (GH#1320)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @seriallazer, @thehomebrewnerd
Fixes Fix logic for inferring variable type from unusual dtype (GH#1273) Allow passing entities without relationships to calculate_feature_matrix (GH#1290) Changes Move query_by_values method from Entity to EntitySet (GH#1251) Move _handle_time method from Entity to EntitySet (GH#1276) Remove usage of ravel to resolve unexpected warning with pandas 1.2.0 (GH#1286) Documentation Changes Fix installation command for Add-ons (GH#1279) Fix various broken links in documentation (GH#1313) Testing Changes Use repository-scoped token for dependency check (GH#1245:, GH#1248) Fix install error during docs CI test (GH#1250) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @thehomebrewnerd
Fix logic for inferring variable type from unusual dtype (GH#1273)
Allow passing entities without relationships to calculate_feature_matrix (GH#1290)
Move query_by_values method from Entity to EntitySet (GH#1251)
query_by_values
Move _handle_time method from Entity to EntitySet (GH#1276)
_handle_time
Remove usage of ravel to resolve unexpected warning with pandas 1.2.0 (GH#1286)
ravel
Fix installation command for Add-ons (GH#1279)
Fix various broken links in documentation (GH#1313)
Use repository-scoped token for dependency check (GH#1245:, GH#1248)
Fix install error during docs CI test (GH#1250)
Entity.query_by_values has been removed and replaced by EntitySet.query_by_values with an added entity_id parameter to specify which entity in the entityset should be used for the query.
Entity.query_by_values
entity_id
Enhancements Allow variable descriptions to be set directly on variable (GH#1207) Add ability to add feature description captions to feature lineage graphs (GH#1212) Add support for local tar file in read_entityset (GH#1228) Fixes Updates to fix unit test errors from koalas 1.4 (GH#1230, GH#1232) Documentation Changes Removed link to unused feedback board (GH#1220) Update footer with Alteryx Innovation Labs (GH#1221) Update links to repo in documentation to use alteryx org url (GH#1224) Testing Changes Update release notes check to use new repo url (GH#1222) Use new version of pull request Github Action (GH#1234) Upgrade pip during featuretools[complete] test (GH#1236) Migrated CI tests to github actions (GH#1226, GH#1237, GH#1239) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @kmax12, @rwedge, @thehomebrewnerd
Allow variable descriptions to be set directly on variable (GH#1207)
Add ability to add feature description captions to feature lineage graphs (GH#1212)
Add support for local tar file in read_entityset (GH#1228)
Updates to fix unit test errors from koalas 1.4 (GH#1230, GH#1232)
Removed link to unused feedback board (GH#1220)
Update footer with Alteryx Innovation Labs (GH#1221)
Update links to repo in documentation to use alteryx org url (GH#1224)
Update release notes check to use new repo url (GH#1222)
Use new version of pull request Github Action (GH#1234)
Upgrade pip during featuretools[complete] test (GH#1236)
Migrated CI tests to github actions (GH#1226, GH#1237, GH#1239)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @kmax12, @rwedge, @thehomebrewnerd
Enhancements Add describe_feature to generate an English language feature description for a given feature (GH#1201) Fixes Update EntitySet.add_last_time_indexes to work with Koalas 1.3.0 (GH#1192, GH#1202) Changes Keep koalas requirements in separate file (GH#1195) Documentation Changes Added footer to the documentation (GH#1189) Add guide for feature selection functions (GH#1184) Fix README.md badge with correct link (GH#1200) Testing Changes Add pyspark and koalas to automated dependency checks (GH#1191) Add DockerHub credentials to CI testing environment (GH#1204) Update premium primitives job name on CI (GH#1205) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Add describe_feature to generate an English language feature description for a given feature (GH#1201)
describe_feature
Update EntitySet.add_last_time_indexes to work with Koalas 1.3.0 (GH#1192, GH#1202)
EntitySet.add_last_time_indexes
Keep koalas requirements in separate file (GH#1195)
Added footer to the documentation (GH#1189)
Add guide for feature selection functions (GH#1184)
Fix README.md badge with correct link (GH#1200)
Add pyspark and koalas to automated dependency checks (GH#1191)
pyspark
koalas
Add DockerHub credentials to CI testing environment (GH#1204)
Update premium primitives job name on CI (GH#1205)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Warning The Text variable type has been deprecated and been replaced with the NaturalLanguage variable type. The Text variable type will be removed in a future release. Fixes Allow FeatureOutputSlice features to be serialized (GH#1150) Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (GH#1160) Determine calculate_feature_matrix behavior with approximate and a cutoff df that is a subclass of a pandas DataFrame (GH#1166) Changes Text variable type has been replaced with NaturalLanguage (GH#1159) Documentation Changes Update release doc for clarity and to add Future Release template (GH#1151) Use the PyData Sphinx theme (GH#1169) Testing Changes Stop requiring single-threaded dask scheduler in tests (GH#1163, GH#1170) Thanks to the following people for contributing to this release: @gsheni, @rwedge, @tamargrey, @tuethan1999
The Text variable type has been deprecated and been replaced with the NaturalLanguage variable type. The Text variable type will be removed in a future release.
Allow FeatureOutputSlice features to be serialized (GH#1150)
Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (GH#1160)
Determine calculate_feature_matrix behavior with approximate and a cutoff df that is a subclass of a pandas DataFrame (GH#1166)
Text variable type has been replaced with NaturalLanguage (GH#1159)
Update release doc for clarity and to add Future Release template (GH#1151)
Use the PyData Sphinx theme (GH#1169)
Stop requiring single-threaded dask scheduler in tests (GH#1163, GH#1170)
Thanks to the following people for contributing to this release: @gsheni, @rwedge, @tamargrey, @tuethan1999
Enhancements Support use of Koalas DataFrames in entitysets (GH#1031) Add feature selection functions for null, correlated, and single value features (GH#1126) Fixes Fix encode_features converting excluded feature columns to a numeric dtype (GH#1123) Improve performance of unused primitive check in dfs (GH#1140) Changes Remove the ability to stack transform primitives (GH#1119, GH#1145) Sort primitives passed to dfs to get consistent ordering of features* (GH#1119) Documentation Changes Added return values to dfs and calculate_feature_matrix (GH#1125) Testing Changes Better test case for normalizing from no time index to time index (GH#1113) * When passing multiple instances of a primitive built with make_trans_primitive or maxe_agg_primitive, those instances must have the same relative order when passed to dfs to ensure a consistent ordering of features. Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Support use of Koalas DataFrames in entitysets (GH#1031)
Add feature selection functions for null, correlated, and single value features (GH#1126)
Fix encode_features converting excluded feature columns to a numeric dtype (GH#1123)
Improve performance of unused primitive check in dfs (GH#1140)
Remove the ability to stack transform primitives (GH#1119, GH#1145)
Sort primitives passed to dfs to get consistent ordering of features* (GH#1119)
dfs
Added return values to dfs and calculate_feature_matrix (GH#1125)
Better test case for normalizing from no time index to time index (GH#1113)
* When passing multiple instances of a primitive built with make_trans_primitive or maxe_agg_primitive, those instances must have the same relative order when passed to dfs to ensure a consistent ordering of features.
make_trans_primitive
maxe_agg_primitive
ft.dfs will no longer build features from Transform primitives where one of the inputs is a Transform feature, a GroupByTransform feature, or a Direct Feature of a Transform / GroupByTransform feature. This will make some features that would previously be generated by ft.dfs only possible if explicitly specified in seed_features.
ft.dfs
seed_features
Fixes Fix EntitySet.plot() when given a dask entityset (GH#1086) Changes Use nlp-primitives[complete] install for nlp_primitives extra in setup.py (GH#1103) Documentation Changes Fix broken downloads badge in README.md (GH#1107) Testing Changes Use CircleCI matrix jobs in config to trigger multiple runs of same job with different parameters (GH#1105) Thanks to the following people for contributing to this release: @gsheni, @systemshift, @thehomebrewnerd
Fix EntitySet.plot() when given a dask entityset (GH#1086)
EntitySet.plot()
Use nlp-primitives[complete] install for nlp_primitives extra in setup.py (GH#1103)
nlp-primitives[complete]
nlp_primitives
setup.py
Fix broken downloads badge in README.md (GH#1107)
Use CircleCI matrix jobs in config to trigger multiple runs of same job with different parameters (GH#1105)
Thanks to the following people for contributing to this release: @gsheni, @systemshift, @thehomebrewnerd
Enhancements Warn user if supplied primitives are not used during dfs (GH#1073) Fixes Use more consistent and uniform warnings (GH#1040) Fix issue with missing instance ids and categorical entity index (GH#1050) Remove warnings.simplefilter in feature_set_calculator to un-silence warnings (GH#1053) Fix feature visualization for features with ‘>’ or ‘<’ in name (GH#1055) Fix boolean dtype mismatch between encode_features and dfs and calculate_feature_matrix (GH#1082) Update primitive options to check reversed inputs if primitive is commutative (GH#1085) Fix inconsistent ordering of features between kernel restarts (GH#1088) Changes Make DFS match TimeSince primitive with all Datetime types (GH#1048) Change default branch to main (GH#1038) Raise TypeError if improper input is supplied to Entity.delete_variables() (GH#1064) Updates for compatibility with pandas 1.1.0 (GH#1079, GH#1089) Set pandas version to pandas>=0.24.1,<2.0.0. Filter pandas deprecation warning in Week primitive. (GH#1094) Documentation Changes Remove benchmarks folder (GH#1049) Add custom variables types section to variables page (GH#1066) Testing Changes Add fixture for ft.demo.load_mock_customer (GH#1036) Refactor Dask test units (GH#1052) Implement automated process for checking critical dependencies (GH#1045, GH#1054, GH#1081) Don’t run changelog check for release PRs or automated dependency PRs (GH#1057) Fix non-deterministic behavior in Dask test causing codecov issues (GH#1070) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @monti-python, @rwedge, @systemshift, @tamargrey, @thehomebrewnerd, @wsankey
Warn user if supplied primitives are not used during dfs (GH#1073)
Use more consistent and uniform warnings (GH#1040)
Fix issue with missing instance ids and categorical entity index (GH#1050)
Remove warnings.simplefilter in feature_set_calculator to un-silence warnings (GH#1053)
Fix feature visualization for features with ‘>’ or ‘<’ in name (GH#1055)
Fix boolean dtype mismatch between encode_features and dfs and calculate_feature_matrix (GH#1082)
Update primitive options to check reversed inputs if primitive is commutative (GH#1085)
Fix inconsistent ordering of features between kernel restarts (GH#1088)
Make DFS match TimeSince primitive with all Datetime types (GH#1048)
TimeSince
Change default branch to main (GH#1038)
main
Raise TypeError if improper input is supplied to Entity.delete_variables() (GH#1064)
Entity.delete_variables()
Updates for compatibility with pandas 1.1.0 (GH#1079, GH#1089)
Set pandas version to pandas>=0.24.1,<2.0.0. Filter pandas deprecation warning in Week primitive. (GH#1094)
Remove benchmarks folder (GH#1049)
Add custom variables types section to variables page (GH#1066)
Add fixture for ft.demo.load_mock_customer (GH#1036)
ft.demo.load_mock_customer
Refactor Dask test units (GH#1052)
Implement automated process for checking critical dependencies (GH#1045, GH#1054, GH#1081)
Don’t run changelog check for release PRs or automated dependency PRs (GH#1057)
Fix non-deterministic behavior in Dask test causing codecov issues (GH#1070)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @monti-python, @rwedge, @systemshift, @tamargrey, @thehomebrewnerd, @wsankey
Enhancements Add list_variable_types and graph_variable_types for Variable Types (GH#1013) Add graph_feature to generate a feature lineage graph for a given feature (GH#1032) Fixes Improve warnings when using a Dask dataframe for cutoff times (GH#1026) Error if attempting to add entityset relationship where child variable is also child index (GH#1034) Changes Remove Feature.get_names (GH#1021) Remove unnecessary pd.Series and pd.DatetimeIndex calls from primitives (GH#1020, GH#1024) Improve cutoff time handling when a single value or no value is passed (GH#1028) Moved find_variable_types to Variable utils (GH#1013) Documentation Changes Add page on Variable Types to describe some Variable Types, and util functions (GH#1013) Remove featuretools enterprise from documentation (GH#1022) Add development install instructions to contributing.md (GH#1030) Testing Changes Add required flag to CircleCI codecov upload command (GH#1035) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @kmax12, @rwedge, @thehomebrewnerd, @tuethan1999
Add list_variable_types and graph_variable_types for Variable Types (GH#1013)
graph_variable_types
Add graph_feature to generate a feature lineage graph for a given feature (GH#1032)
graph_feature
Improve warnings when using a Dask dataframe for cutoff times (GH#1026)
Error if attempting to add entityset relationship where child variable is also child index (GH#1034)
Remove Feature.get_names (GH#1021)
Feature.get_names
Remove unnecessary pd.Series and pd.DatetimeIndex calls from primitives (GH#1020, GH#1024)
pd.Series
pd.DatetimeIndex
Improve cutoff time handling when a single value or no value is passed (GH#1028)
Moved find_variable_types to Variable utils (GH#1013)
find_variable_types
Add page on Variable Types to describe some Variable Types, and util functions (GH#1013)
Remove featuretools enterprise from documentation (GH#1022)
Add development install instructions to contributing.md (GH#1030)
Add required flag to CircleCI codecov upload command (GH#1035)
required
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @kmax12, @rwedge, @thehomebrewnerd, @tuethan1999
Removed Feature.get_names, Feature.get_feature_names should be used instead
Feature.get_feature_names
Enhancements Support use of Dask DataFrames in entitysets (GH#783) Add make_index when initializing an EntitySet by passing in an entities dictionary (GH#1010) Add ability to use primitive classes and instances as keys in primitive_options dictionary (GH#993) Fixes Cleanly close tqdm instance (GH#1018) Resolve issue with NaN values in LatLong columns (GH#1007) Testing Changes Update tests for numpy v1.19.0 compatability (GH#1016) Thanks to the following people for contributing to this release: @Alex-Monahan, @frances-h, @gsheni, @rwedge, @thehomebrewnerd
Support use of Dask DataFrames in entitysets (GH#783)
Add make_index when initializing an EntitySet by passing in an entities dictionary (GH#1010)
entities
Add ability to use primitive classes and instances as keys in primitive_options dictionary (GH#993)
Cleanly close tqdm instance (GH#1018)
Resolve issue with NaN values in LatLong columns (GH#1007)
NaN
LatLong
Update tests for numpy v1.19.0 compatability (GH#1016)
Thanks to the following people for contributing to this release: @Alex-Monahan, @frances-h, @gsheni, @rwedge, @thehomebrewnerd
Enhancements Add get_default_aggregation_primitives and get_default_transform_primitives (GH#945) Allow cutoff time dataframe columns to be in any order (GH#969, GH#995) Add Age primitive, and make it a default transform primitive for DFS (GH#987) Add include_cutoff_time arg - control whether data at cutoff times are included in feature calculations (GH#959) Allow variables_types to be referenced by their type_string for the entity_from_dataframe function (GH#988) Fixes Fix errors with Equals and NotEquals primitives when comparing categoricals or different dtypes (GH#968) Normalized type_strings of Variable classes so that the find_variable_types function produces a dictionary with a clear key to name transition (GH#982, GH#996) Remove pandas.datetime in test_calculate_feature_matrix due to deprecation (GH#998) Documentation Changes Add python 3.8 support for docs (GH#983) Adds consistent Entityset Docstrings (GH#986) Testing Changes Add automated tests for python 3.8 environment (GH#847) Update testing dependencies (GH#976) Thanks to the following people for contributing to this release: @ctduffy, @frances-h, @gsheni, @jeff-hernandez, @rightx2, @rwedge, @sebrahimi1988, @thehomebrewnerd, @tuethan1999
Add get_default_aggregation_primitives and get_default_transform_primitives (GH#945)
get_default_aggregation_primitives
get_default_transform_primitives
Allow cutoff time dataframe columns to be in any order (GH#969, GH#995)
Add Age primitive, and make it a default transform primitive for DFS (GH#987)
Add include_cutoff_time arg - control whether data at cutoff times are included in feature calculations (GH#959)
include_cutoff_time
Allow variables_types to be referenced by their type_string for the entity_from_dataframe function (GH#988)
variables_types
type_string
Fix errors with Equals and NotEquals primitives when comparing categoricals or different dtypes (GH#968)
Normalized type_strings of Variable classes so that the find_variable_types function produces a dictionary with a clear key to name transition (GH#982, GH#996)
Remove pandas.datetime in test_calculate_feature_matrix due to deprecation (GH#998)
Add python 3.8 support for docs (GH#983)
Adds consistent Entityset Docstrings (GH#986)
Add automated tests for python 3.8 environment (GH#847)
Update testing dependencies (GH#976)
Thanks to the following people for contributing to this release: @ctduffy, @frances-h, @gsheni, @jeff-hernandez, @rightx2, @rwedge, @sebrahimi1988, @thehomebrewnerd, @tuethan1999
Calls to featuretools.dfs or featuretools.calculate_feature_matrix that use a cutoff time dataframe, but do not label the time column with either the target entity time index variable name or as time, will now result in an AttributeError. Previously, the time column was selected to be the first column that was not the instance id column. With this update, the position of the column in the dataframe is no longer used to determine the time column. Now, both instance id columns and time columns in a cutoff time dataframe can be in any order as long as they are named properly.
featuretools.dfs
featuretools.calculate_feature_matrix
time
AttributeError
The type_string attributes of all Variable subclasses are now a snake case conversion of their class names. This changes the type_string of the Unknown, IPAddress, EmailAddress, SubRegionCode, FilePath, LatLong, and ZIPcode classes. Old saved entitysets that used these variables may load incorrectly.
Unknown
IPAddress
EmailAddress
SubRegionCode
FilePath
ZIPcode
Enhancements ft.encode_features - use less memory for one-hot encoded columns (GH#876) Fixes Use logger.warning to fix deprecated logger.warn (GH#871) Add dtype to interesting_values to fix deprecated empty Series with no dtype (GH#933) Remove overlap in training windows (GH#930) Fix progress bar in notebook (GH#932) Changes Change premium primitives CI test to Python 3.6 (GH#916) Remove Python 3.5 support (GH#917) Documentation Changes Fix README links to docs (GH#872) Fix Github links with correct organizations (GH#908) Fix hyperlinks in docs and docstrings with updated address (GH#910) Remove unused script for uploading docs to AWS (GH#911) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge
ft.encode_features - use less memory for one-hot encoded columns (GH#876)
Use logger.warning to fix deprecated logger.warn (GH#871)
Add dtype to interesting_values to fix deprecated empty Series with no dtype (GH#933)
Remove overlap in training windows (GH#930)
Fix progress bar in notebook (GH#932)
Change premium primitives CI test to Python 3.6 (GH#916)
Remove Python 3.5 support (GH#917)
Fix README links to docs (GH#872)
Fix Github links with correct organizations (GH#908)
Fix hyperlinks in docs and docstrings with updated address (GH#910)
Remove unused script for uploading docs to AWS (GH#911)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge
Using training windows in feature calculations can result in different values than previous versions. This was done to prevent consecutive training windows from overlapping by excluding data at the oldest point in time. For example, if we use a cutoff time at the first minute of the hour with a one hour training window, the first minute of the previous hour will no longer be included in the feature calculation.
Warning The next non-bugfix release of Featuretools will not support Python 3.5 Fixes Fix ft.show_info() not displaying in Jupyter notebooks (GH#863) Changes Added Plugin Warnings at Entry Point (GH#850, GH#869) Documentation Changes Add links to primitives.featurelabs.com (GH#860) Add source code links to API reference (GH#862) Update links for testing Dask/Spark integrations (GH#867) Update release documentation for featuretools (GH#868) Testing Changes Miscellaneous changes (GH#861) Thanks to the following people for contributing to this release: @frances-h, @FreshLeaf8865, @jeff-hernandez, @rwedge, @thehomebrewnerd
The next non-bugfix release of Featuretools will not support Python 3.5
Fix ft.show_info() not displaying in Jupyter notebooks (GH#863)
Added Plugin Warnings at Entry Point (GH#850, GH#869)
Add links to primitives.featurelabs.com (GH#860)
Add source code links to API reference (GH#862)
Update links for testing Dask/Spark integrations (GH#867)
Update release documentation for featuretools (GH#868)
Miscellaneous changes (GH#861)
Thanks to the following people for contributing to this release: @frances-h, @FreshLeaf8865, @jeff-hernandez, @rwedge, @thehomebrewnerd
Fixes Fix a connection closed error when using n_jobs (GH#853) Changes Pin msgpack dependency for Python 3.5; remove dataframe from Dask dependency (GH#851) Documentation Changes Update link to help documentation page in Github issue template (GH#855) Thanks to the following people for contributing to this release: @frances-h, @rwedge
Fix a connection closed error when using n_jobs (GH#853)
Pin msgpack dependency for Python 3.5; remove dataframe from Dask dependency (GH#851)
Update link to help documentation page in Github issue template (GH#855)
Thanks to the following people for contributing to this release: @frances-h, @rwedge
Enhancements Support for Pandas 1.0.0 (GH#844) Changes Remove dependency on s3fs library for anonymous downloads from S3 (GH#825) Testing Changes Added GitHub Action to automatically run performance tests (GH#840) Thanks to the following people for contributing to this release: @frances-h, @rwedge
Support for Pandas 1.0.0 (GH#844)
Remove dependency on s3fs library for anonymous downloads from S3 (GH#825)
Added GitHub Action to automatically run performance tests (GH#840)
Fixes Raise error when given wrong input for ignore_variables (GH#826) Fix multi-output features not created when there is no child data (GH#834) Removing type casting in Equals and NotEquals primitives (GH#504) Changes Replace pd.timedelta time units that were deprecated (GH#822) Move sklearn wrapper to separate library (GH#835, GH#837) Testing Changes Run unit tests in windows environment (GH#790) Update boto3 version requirement for tests (GH#838) Thanks to the following people for contributing to this release: @jeffzi, @kmax12, @rwedge, @systemshift
Raise error when given wrong input for ignore_variables (GH#826)
Fix multi-output features not created when there is no child data (GH#834)
Removing type casting in Equals and NotEquals primitives (GH#504)
Replace pd.timedelta time units that were deprecated (GH#822)
Move sklearn wrapper to separate library (GH#835, GH#837)
Run unit tests in windows environment (GH#790)
Update boto3 version requirement for tests (GH#838)
Thanks to the following people for contributing to this release: @jeffzi, @kmax12, @rwedge, @systemshift
Enhancements Added GitHub Action to auto upload releases to PyPI (GH#816) Fixes Fix issue where some primitive options would not be applied (GH#807) Fix issue with converting to pickle or parquet after adding interesting features (GH#798, GH#823) Diff primitive now calculates using all available data (GH#824) Prevent DFS from creating Identity Features of globally ignored variables (GH#819) Changes Remove python 2.7 support from serialize.py (GH#812) Make smart_open, boto3, and s3fs optional dependencies (GH#827) Documentation Changes remove python 2.7 support and add 3.7 in install.rst (GH#805) Fix import error in docs (GH#803) Fix release title formatting in changelog (GH#806) Testing Changes Use multiple CPUS to run tests on CI (GH#811) Refactor test entityset creation to avoid saving to disk (GH#813, GH#821) Remove get_values() from test_es.py to remove warnings (GH#820) Thanks to the following people for contributing to this release: @frances-h, @jeff-hernandez, @rwedge, @systemshift
Added GitHub Action to auto upload releases to PyPI (GH#816)
Fix issue where some primitive options would not be applied (GH#807)
Fix issue with converting to pickle or parquet after adding interesting features (GH#798, GH#823)
Diff primitive now calculates using all available data (GH#824)
Prevent DFS from creating Identity Features of globally ignored variables (GH#819)
Remove python 2.7 support from serialize.py (GH#812)
Make smart_open, boto3, and s3fs optional dependencies (GH#827)
remove python 2.7 support and add 3.7 in install.rst (GH#805)
Fix import error in docs (GH#803)
Fix release title formatting in changelog (GH#806)
Use multiple CPUS to run tests on CI (GH#811)
Refactor test entityset creation to avoid saving to disk (GH#813, GH#821)
Remove get_values() from test_es.py to remove warnings (GH#820)
Thanks to the following people for contributing to this release: @frances-h, @jeff-hernandez, @rwedge, @systemshift
The libraries used for downloading or uploading from S3 or URLs are now optional and will no longer be installed by default. To use this functionality they will need to be installed separately.
The fix to how the Diff primitive is calculated may slow down the overall calculation time of feature lists that use this primitive.
Enhancements Added First primitive (GH#770) Added Entropy aggregation primitive (GH#779) Allow custom naming for multi-output primitives (GH#780) Fixes Prevents user from removing base entity time index using additional_variables (GH#768) Fixes error when a multioutput primitive was supplied to dfs as a groupby trans primitive (GH#786) Changes Drop Python 2 support (GH#759) Add unit parameter to AvgTimeBetween (GH#771) Require Pandas 0.24.1 or higher (GH#787) Documentation Changes Update featuretools slack link (GH#765) Set up repo to use Read the Docs (GH#776) Add First primitive to API reference docs (GH#782) Testing Changes CircleCI fixes (GH#774) Disable PIP progress bars (GH#775) Thanks to the following people for contributing to this release: @ablacke-ayx, @BoopBoopBeepBoop, @jeffzi, @kmax12, @rwedge, @thehomebrewnerd, @twdobson
Added First primitive (GH#770)
Added Entropy aggregation primitive (GH#779)
Allow custom naming for multi-output primitives (GH#780)
Prevents user from removing base entity time index using additional_variables (GH#768)
Fixes error when a multioutput primitive was supplied to dfs as a groupby trans primitive (GH#786)
Drop Python 2 support (GH#759)
Add unit parameter to AvgTimeBetween (GH#771)
Require Pandas 0.24.1 or higher (GH#787)
Update featuretools slack link (GH#765)
Set up repo to use Read the Docs (GH#776)
Add First primitive to API reference docs (GH#782)
CircleCI fixes (GH#774)
Disable PIP progress bars (GH#775)
Thanks to the following people for contributing to this release: @ablacke-ayx, @BoopBoopBeepBoop, @jeffzi, @kmax12, @rwedge, @thehomebrewnerd, @twdobson
Warning The next non-bugfix release of Featuretools will not support Python 2 Enhancements Improve how files are copied and written (GH#721) Add number of rows to graph in entityset.plot (GH#727) Added support for pandas DateOffsets in DFS and Timedelta (GH#732) Enable feature-specific top_n value using a dictionary in encode_features (GH#735) Added progress_callback parameter to dfs() and calculate_feature_matrix() (GH#739, GH#745) Enable specifying primitives on a per column or per entity basis (GH#748) Fixes Fixed entity set deserialization (GH#720) Added error message when DateTimeIndex is a variable but not set as the time_index (GH#723) Fixed CumCount and other group-by transform primitives that take ID as input (GH#733, GH#754) Fix progress bar undercounting (GH#743) Updated training_window error assertion to only check against observations (GH#728) Don’t delete the whole destination folder while saving entityset (GH#717) Changes Raise warning and not error on schema version mismatch (GH#718) Change feature calculation to return in order of instance ids provided (GH#676) Removed time remaining from displayed progress bar in dfs() and calculate_feature_matrix() (GH#739) Raise warning in normalize_entity() when time_index of base_entity has an invalid type (GH#749) Remove toolz as a direct dependency (GH#755) Allow boolean variable types to be used in the Multiply primitive (GH#756) Documentation Changes Updated URL for Compose (GH#716) Testing Changes Update dependencies (GH#738, GH#741, GH#747) Thanks to the following people for contributing to this release: @angela97lin, @chidauri, @christopherbunn, @frances-h, @jeff-hernandez, @kmax12, @MarcoGorelli, @rwedge, @thehomebrewnerd
The next non-bugfix release of Featuretools will not support Python 2
Improve how files are copied and written (GH#721)
Add number of rows to graph in entityset.plot (GH#727)
Added support for pandas DateOffsets in DFS and Timedelta (GH#732)
Enable feature-specific top_n value using a dictionary in encode_features (GH#735)
Added progress_callback parameter to dfs() and calculate_feature_matrix() (GH#739, GH#745)
Enable specifying primitives on a per column or per entity basis (GH#748)
Fixed entity set deserialization (GH#720)
Added error message when DateTimeIndex is a variable but not set as the time_index (GH#723)
Fixed CumCount and other group-by transform primitives that take ID as input (GH#733, GH#754)
Fix progress bar undercounting (GH#743)
Updated training_window error assertion to only check against observations (GH#728)
Don’t delete the whole destination folder while saving entityset (GH#717)
Raise warning and not error on schema version mismatch (GH#718)
Change feature calculation to return in order of instance ids provided (GH#676)
Removed time remaining from displayed progress bar in dfs() and calculate_feature_matrix() (GH#739)
Raise warning in normalize_entity() when time_index of base_entity has an invalid type (GH#749)
Remove toolz as a direct dependency (GH#755)
Allow boolean variable types to be used in the Multiply primitive (GH#756)
Updated URL for Compose (GH#716)
Update dependencies (GH#738, GH#741, GH#747)
Thanks to the following people for contributing to this release: @angela97lin, @chidauri, @christopherbunn, @frances-h, @jeff-hernandez, @kmax12, @MarcoGorelli, @rwedge, @thehomebrewnerd
Feature calculations will return in the order of instance ids provided instead of the order of time points instances are calculated at.
Fixes Fix serialized LatLong data being loaded as strings (GH#712) Documentation Changes Fixed FAQ cell output (GH#710) Thanks to the following people for contributing to this release: @gsheni, @rwedge
Fix serialized LatLong data being loaded as strings (GH#712)
Fixed FAQ cell output (GH#710)
Warning The next non-bugfix release of Featuretools will not support Python 2 Enhancements Give more frequent progress bar updates and update chunk size behavior (GH#631, GH#696) Added drop_first as param in encode_features (GH#647) Added support for stacking multi-output primitives (GH#679) Generate transform features of direct features (GH#623) Added serializing and deserializing from S3 and deserializing from URLs (GH#685) Added nlp_primitives as an add-on library (GH#704) Added AutoNormalize to Featuretools plugins (GH#699) Added functionality for relative units (month/year) in Timedelta (GH#692) Added categorical-encoding as an add-on library (GH#700) Fixes Fix performance regression in DFS (GH#637) Fix deserialization of feature relationship path (GH#665) Set index after adding ancestor relationship variables (GH#668) Fix user-supplied variable_types modification in Entity init (GH#675) Don’t calculate dependencies of unnecessary features (GH#667) Prevent normalize entity’s new entity having same index as base entity (GH#681) Update variable type inference to better check for string values (GH#683) Changes Moved dask, distributed imports (GH#634) Documentation Changes Miscellaneous changes (GH#641, GH#658) Modified doc_string of top_n in encoding (GH#648) Hyperlinked ComposeML (GH#653) Added FAQ (GH#620, GH#677) Fixed FAQ question with multiple question marks (GH#673) Testing Changes Add master, and release tests for premium primitives (GH#660, GH#669) Miscellaneous changes (GH#672, GH#674) Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @ayushpatidar, @CJStadler, @ctduffy, @gsheni, @jeff-hernandez, @jeremyliweishih, @kmax12, @rwedge, @zhxt95,
Give more frequent progress bar updates and update chunk size behavior (GH#631, GH#696)
Added drop_first as param in encode_features (GH#647)
Added support for stacking multi-output primitives (GH#679)
Generate transform features of direct features (GH#623)
Added serializing and deserializing from S3 and deserializing from URLs (GH#685)
Added nlp_primitives as an add-on library (GH#704)
Added AutoNormalize to Featuretools plugins (GH#699)
Added functionality for relative units (month/year) in Timedelta (GH#692)
Added categorical-encoding as an add-on library (GH#700)
Fix performance regression in DFS (GH#637)
Fix deserialization of feature relationship path (GH#665)
Set index after adding ancestor relationship variables (GH#668)
Fix user-supplied variable_types modification in Entity init (GH#675)
Don’t calculate dependencies of unnecessary features (GH#667)
Prevent normalize entity’s new entity having same index as base entity (GH#681)
Update variable type inference to better check for string values (GH#683)
Moved dask, distributed imports (GH#634)
Miscellaneous changes (GH#641, GH#658)
Modified doc_string of top_n in encoding (GH#648)
Hyperlinked ComposeML (GH#653)
Added FAQ (GH#620, GH#677)
Fixed FAQ question with multiple question marks (GH#673)
Add master, and release tests for premium primitives (GH#660, GH#669)
Miscellaneous changes (GH#672, GH#674)
Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @ayushpatidar, @CJStadler, @ctduffy, @gsheni, @jeff-hernandez, @jeremyliweishih, @kmax12, @rwedge, @zhxt95,
Enhancements Speedup groupby transform calculations (GH#609) Generate features along all paths when there are multiple paths between entities (GH#600, GH#608) Fixes Select columns of dataframe using a list (GH#615) Change type of features calculated on Index features to Categorical (GH#602) Filter dataframes through forward relationships (GH#625) Specify Dask version in requirements for python 2 (GH#627) Keep dataframe sorted by time during feature calculation (GH#626) Fix bug in encode_features that created duplicate columns of features with multiple outputs (GH#622) Changes Remove unused variance_selection.py file (GH#613) Remove Timedelta data param (GH#619) Remove DaysSince primitive (GH#628) Documentation Changes Add installation instructions for add-on libraries (GH#617) Clarification of Multi Output Feature Creation (GH#638) Miscellaneous changes (GH#632, GH#639) Testing Changes Miscellaneous changes (GH#595, GH#612) Thanks to the following people for contributing to this release: @CJStadler, @kmax12, @rwedge, @gsheni, @kkleidal, @ctduffy
Speedup groupby transform calculations (GH#609)
Generate features along all paths when there are multiple paths between entities (GH#600, GH#608)
Select columns of dataframe using a list (GH#615)
Change type of features calculated on Index features to Categorical (GH#602)
Filter dataframes through forward relationships (GH#625)
Specify Dask version in requirements for python 2 (GH#627)
Keep dataframe sorted by time during feature calculation (GH#626)
Fix bug in encode_features that created duplicate columns of features with multiple outputs (GH#622)
Remove unused variance_selection.py file (GH#613)
Remove Timedelta data param (GH#619)
Remove DaysSince primitive (GH#628)
Add installation instructions for add-on libraries (GH#617)
Clarification of Multi Output Feature Creation (GH#638)
Miscellaneous changes (GH#632, GH#639)
Miscellaneous changes (GH#595, GH#612)
Thanks to the following people for contributing to this release: @CJStadler, @kmax12, @rwedge, @gsheni, @kkleidal, @ctduffy
Enhancements Add unit parameter to timesince primitives (GH#558) Add ability to install optional add on libraries (GH#551) Load and save features from open files and strings (GH#566) Support custom variable types (GH#571) Support entitysets which have multiple paths between two entities (GH#572, GH#544) Added show_info function, more output information added to CLI featuretools info (GH#525) Fixes Normalize_entity specifies error when ‘make_time_index’ is an invalid string (GH#550) Schema version added for entityset serialization (GH#586) Renamed features have names correctly serialized (GH#585) Improved error message for index/time_index being the same column in normalize_entity and entity_from_dataframe (GH#583) Removed all mentions of allow_where (GH#587, GH#588) Removed unused variable in normalize entity (GH#589) Change time since return type to numeric (GH#606) Changes Refactor get_pandas_data_slice to take single entity (GH#547) Updates TimeSincePrevious and Diff Primitives (GH#561) Remove unecessary time_last variable (GH#546) Documentation Changes Add Featuretools Enterprise to documentation (GH#563) Miscellaneous changes (GH#552, GH#573, GH#577, GH#599) Testing Changes Miscellaneous changes (GH#559, GH#569, GH#570, GH#574, GH#584, GH#590) Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @CJStadler, @ctduffy, @gsheni, @kmax12, @rwedge
Add unit parameter to timesince primitives (GH#558)
Add ability to install optional add on libraries (GH#551)
Load and save features from open files and strings (GH#566)
Support custom variable types (GH#571)
Support entitysets which have multiple paths between two entities (GH#572, GH#544)
Added show_info function, more output information added to CLI featuretools info (GH#525)
Normalize_entity specifies error when ‘make_time_index’ is an invalid string (GH#550)
Schema version added for entityset serialization (GH#586)
Renamed features have names correctly serialized (GH#585)
Improved error message for index/time_index being the same column in normalize_entity and entity_from_dataframe (GH#583)
Removed all mentions of allow_where (GH#587, GH#588)
Removed unused variable in normalize entity (GH#589)
Change time since return type to numeric (GH#606)
Refactor get_pandas_data_slice to take single entity (GH#547)
Updates TimeSincePrevious and Diff Primitives (GH#561)
Remove unecessary time_last variable (GH#546)
Add Featuretools Enterprise to documentation (GH#563)
Miscellaneous changes (GH#552, GH#573, GH#577, GH#599)
Miscellaneous changes (GH#559, GH#569, GH#570, GH#574, GH#584, GH#590)
Thanks to the following people for contributing to this release: @alexjwang, @allisonportis, @CJStadler, @ctduffy, @gsheni, @kmax12, @rwedge
Rename NUnique to NumUnique (GH#510) Serialize features as JSON (GH#532) Drop all variables at once in normalize_entity (GH#533) Remove unnecessary sorting from normalize_entity (GH#535) Features cache their names (GH#536) Only calculate features for instances before cutoff (GH#523) Remove all relative imports (GH#530) Added FullName Variable Type (GH#506) Add error message when target entity does not exist (GH#520) New demo links (GH#542) Remove duplicate features check in DFS (GH#538) featuretools_primitives entry point expects list of primitive classes (GH#529) Update ALL_VARIABLE_TYPES list (GH#526) More Informative N Jobs Prints and Warnings (GH#511) Update sklearn version requirements (GH#541) Update Makefile (GH#519) Remove unused parameter in Entity._handle_time (GH#524) Remove build_ext code from setup.py (GH#513) Documentation updates (GH#512, GH#514, GH#515, GH#521, GH#522, GH#527, GH#545) Testing updates (GH#509, GH#516, GH#517, GH#539) Thanks to the following people for contributing to this release: @bphi, @CharlesBradshaw, @CJStadler, @glentennis, @gsheni, @kmax12, @rwedge
Rename NUnique to NumUnique (GH#510)
Serialize features as JSON (GH#532)
Drop all variables at once in normalize_entity (GH#533)
Remove unnecessary sorting from normalize_entity (GH#535)
Features cache their names (GH#536)
Only calculate features for instances before cutoff (GH#523)
Remove all relative imports (GH#530)
Added FullName Variable Type (GH#506)
Add error message when target entity does not exist (GH#520)
New demo links (GH#542)
Remove duplicate features check in DFS (GH#538)
featuretools_primitives entry point expects list of primitive classes (GH#529)
Update ALL_VARIABLE_TYPES list (GH#526)
More Informative N Jobs Prints and Warnings (GH#511)
Update sklearn version requirements (GH#541)
Update Makefile (GH#519)
Remove unused parameter in Entity._handle_time (GH#524)
Remove build_ext code from setup.py (GH#513)
Documentation updates (GH#512, GH#514, GH#515, GH#521, GH#522, GH#527, GH#545)
Testing updates (GH#509, GH#516, GH#517, GH#539)
Thanks to the following people for contributing to this release: @bphi, @CharlesBradshaw, @CJStadler, @glentennis, @gsheni, @kmax12, @rwedge
NUnique has been renamed to NumUnique.
NUnique
NumUnique
Previous behavior from featuretools.primitives import NUnique New behavior from featuretools.primitives import NumUnique
Previous behavior
from featuretools.primitives import NUnique
New behavior
from featuretools.primitives import NumUnique
Automatically generate feature name for controllable primitives (GH#481) Primitive docstring updates (GH#489, GH#492, GH#494, GH#495) Change primitive functions that returned strings to return functions (GH#499) CLI customizable via entrypoints (GH#493) Improve calculation of aggregation features on grandchildren (GH#479) Refactor entrypoints to use decorator (GH#483) Include doctests in testing suite (GH#491) Documentation updates (GH#490) Update how standard primitives are imported internally (GH#482) Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @glentennis, @gsheni, @jeff-hernandez, @kmax12, @minkvsky, @rwedge, @thehomebrewnerd
Automatically generate feature name for controllable primitives (GH#481)
Primitive docstring updates (GH#489, GH#492, GH#494, GH#495)
Change primitive functions that returned strings to return functions (GH#499)
CLI customizable via entrypoints (GH#493)
Improve calculation of aggregation features on grandchildren (GH#479)
Refactor entrypoints to use decorator (GH#483)
Include doctests in testing suite (GH#491)
Documentation updates (GH#490)
Update how standard primitives are imported internally (GH#482)
Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @glentennis, @gsheni, @jeff-hernandez, @kmax12, @minkvsky, @rwedge, @thehomebrewnerd
Improve Entity Set Serialization (GH#361) Support calling a primitive instance’s function directly (GH#461, GH#468) Support other libraries extending featuretools functionality via entrypoints (GH#452) Remove featuretools install command (GH#475) Add GroupByTransformFeature (GH#455, GH#472, GH#476) Update Haversine Primitive (GH#435, GH#462) Add commutative argument to SubtractNumeric and DivideNumeric primitives (GH#457) Add FilePath variable_type (GH#470) Add PhoneNumber, DateOfBirth, URL variable types (GH#447) Generalize infer_variable_type, convert_variable_data and convert_all_variable_data methods (GH#423) Documentation updates (GH#438, GH#446, GH#458, GH#469) Testing updates (GH#440, GH#444, GH#445, GH#459) Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @ColCarroll, @glentennis, @grayskripko, @gsheni, @jeff-hernandez, @jrkinley, @kmax12, @RogerTangos, @rwedge
Improve Entity Set Serialization (GH#361)
Support calling a primitive instance’s function directly (GH#461, GH#468)
Support other libraries extending featuretools functionality via entrypoints (GH#452)
Remove featuretools install command (GH#475)
Add GroupByTransformFeature (GH#455, GH#472, GH#476)
Update Haversine Primitive (GH#435, GH#462)
Add commutative argument to SubtractNumeric and DivideNumeric primitives (GH#457)
Add FilePath variable_type (GH#470)
Add PhoneNumber, DateOfBirth, URL variable types (GH#447)
Generalize infer_variable_type, convert_variable_data and convert_all_variable_data methods (GH#423)
Documentation updates (GH#438, GH#446, GH#458, GH#469)
Testing updates (GH#440, GH#444, GH#445, GH#459)
Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @ColCarroll, @glentennis, @grayskripko, @gsheni, @jeff-hernandez, @jrkinley, @kmax12, @RogerTangos, @rwedge
ft.dfs now has a groupby_trans_primitives parameter that DFS uses to automatically construct features that group by an ID column and then apply a transform primitive to search group. This change applies to the following primitives: CumSum, CumCount, CumMean, CumMin, and CumMax.
groupby_trans_primitives
CumSum
CumMean
CumMin
CumMax
Previous behavior ft.dfs(entityset=es, target_entity='customers', trans_primitives=["cum_mean"]) New behavior ft.dfs(entityset=es, target_entity='customers', groupby_trans_primitives=["cum_mean"])
ft.dfs(entityset=es, target_entity='customers', trans_primitives=["cum_mean"])
ft.dfs(entityset=es, target_entity='customers', groupby_trans_primitives=["cum_mean"])
Related to the above change, cumulative transform features are now defined using a new feature class, GroupByTransformFeature.
GroupByTransformFeature
Previous behavior ft.Feature([base_feature, groupby_feature], primitive=CumulativePrimitive) New behavior ft.Feature(base_feature, groupby=groupby_feature, primitive=CumulativePrimitive)
ft.Feature([base_feature, groupby_feature], primitive=CumulativePrimitive)
ft.Feature(base_feature, groupby=groupby_feature, primitive=CumulativePrimitive)
Cumulative primitives (GH#410) Entity.query_by_values now preserves row order of underlying data (GH#428) Implementing Country Code and Sub Region Codes as variable types (GH#430) Added IPAddress and EmailAddress variable types (GH#426) Install data and dependencies (GH#403) Add TimeSinceFirst, fix TimeSinceLast (GH#388) Allow user to pass in desired feature return types (GH#372) Add new configuration object (GH#401) Replace NUnique get_function (GH#434) _calculate_idenity_features now only returns the features asked for, instead of the entire entity (GH#429) Primitive function name uniqueness (GH#424) Update NumCharacters and NumWords primitives (GH#419) Removed Variable.dtype (GH#416, GH#433) Change to zipcode rep, str for pandas (GH#418) Remove pandas version upper bound (GH#408) Make S3 dependencies optional (GH#404) Check that agg_primitives and trans_primitives are right primitive type (GH#397) Mean primitive changes (GH#395) Fix transform stacking on multi-output aggregation (GH#394) Fix list_primitives (GH#391) Handle graphviz dependency (GH#389, GH#396, GH#398) Testing updates (GH#402, GH#417, GH#433) Documentation updates (GH#400, GH#409, GH#415, GH#417, GH#420, GH#421, GH#422, GH#431) Thanks to the following people for contributing to this release: @CharlesBradshaw, @csala, @floscha, @gsheni, @jxwolstenholme, @kmax12, @RogerTangos, @rwedge
Cumulative primitives (GH#410)
Entity.query_by_values now preserves row order of underlying data (GH#428)
Implementing Country Code and Sub Region Codes as variable types (GH#430)
Added IPAddress and EmailAddress variable types (GH#426)
Install data and dependencies (GH#403)
Add TimeSinceFirst, fix TimeSinceLast (GH#388)
Allow user to pass in desired feature return types (GH#372)
Add new configuration object (GH#401)
Replace NUnique get_function (GH#434)
_calculate_idenity_features now only returns the features asked for, instead of the entire entity (GH#429)
Primitive function name uniqueness (GH#424)
Update NumCharacters and NumWords primitives (GH#419)
Removed Variable.dtype (GH#416, GH#433)
Change to zipcode rep, str for pandas (GH#418)
Remove pandas version upper bound (GH#408)
Make S3 dependencies optional (GH#404)
Check that agg_primitives and trans_primitives are right primitive type (GH#397)
Mean primitive changes (GH#395)
Fix transform stacking on multi-output aggregation (GH#394)
Fix list_primitives (GH#391)
Handle graphviz dependency (GH#389, GH#396, GH#398)
Testing updates (GH#402, GH#417, GH#433)
Documentation updates (GH#400, GH#409, GH#415, GH#417, GH#420, GH#421, GH#422, GH#431)
Thanks to the following people for contributing to this release: @CharlesBradshaw, @csala, @floscha, @gsheni, @jxwolstenholme, @kmax12, @RogerTangos, @rwedge
Primitive refactor (GH#364) Mean ignore NaNs (GH#379) Plotting entitysets (GH#382) Add seed features later in DFS process (GH#357) Multiple output column features (GH#376) Add ZipCode Variable Type (GH#367) Add primitive.get_filepath and example of primitive loading data from external files (GH#380) Transform primitives take series as input (GH#385) Update dependency requirements (GH#378, GH#383, GH#386) Add modulo to override tests (GH#384) Update documentation (GH#368, GH#377) Update README.md (GH#366, GH#373) Update CI tests (GH#359, GH#360, GH#375) Thanks to the following people for contributing to this release: @floscha, @gsheni, @kmax12, @RogerTangos, @rwedge
Primitive refactor (GH#364)
Mean ignore NaNs (GH#379)
Plotting entitysets (GH#382)
Add seed features later in DFS process (GH#357)
Multiple output column features (GH#376)
Add ZipCode Variable Type (GH#367)
Add primitive.get_filepath and example of primitive loading data from external files (GH#380)
Transform primitives take series as input (GH#385)
Update dependency requirements (GH#378, GH#383, GH#386)
Add modulo to override tests (GH#384)
Update documentation (GH#368, GH#377)
Update README.md (GH#366, GH#373)
Update CI tests (GH#359, GH#360, GH#375)
Thanks to the following people for contributing to this release: @floscha, @gsheni, @kmax12, @RogerTangos, @rwedge
Add missing dependencies (GH#353) Move comment to note in documentation (GH#352)
Add missing dependencies (GH#353)
Move comment to note in documentation (GH#352)
Add specific error for duplicate additional/copy_variables in normalize_entity (GH#348) Removed EntitySet._import_from_dataframe (GH#346) Removed time_index_reduce parameter (GH#344) Allow installation of additional primitives (GH#326) Fix DatetimeIndex variable conversion (GH#342) Update Sklearn DFS Transformer (GH#343) Clean up entity creation logic (GH#336) remove casting to list in transform feature calculation (GH#330) Fix sklearn wrapper (GH#335) Add readme to pypi Update conda docs after move to conda-forge (GH#334) Add wrapper for scikit-learn Pipelines (GH#323) Remove parse_date_cols parameter from EntitySet._import_from_dataframe (GH#333) Thanks to the following people for contributing to this release: @bukosabino, @georgewambold, @gsheni, @jeff-hernandez, @kmax12, and @rwedge.
Add specific error for duplicate additional/copy_variables in normalize_entity (GH#348)
Removed EntitySet._import_from_dataframe (GH#346)
Removed time_index_reduce parameter (GH#344)
Allow installation of additional primitives (GH#326)
Fix DatetimeIndex variable conversion (GH#342)
Update Sklearn DFS Transformer (GH#343)
Clean up entity creation logic (GH#336)
remove casting to list in transform feature calculation (GH#330)
Fix sklearn wrapper (GH#335)
Add readme to pypi
Update conda docs after move to conda-forge (GH#334)
Add wrapper for scikit-learn Pipelines (GH#323)
Remove parse_date_cols parameter from EntitySet._import_from_dataframe (GH#333)
Thanks to the following people for contributing to this release: @bukosabino, @georgewambold, @gsheni, @jeff-hernandez, @kmax12, and @rwedge.
Resolve bug preventing using first column as index by default (GH#308) Handle return type when creating features from Id variables (GH#318) Make id an optional parameter of EntitySet constructor (GH#324) Handle primitives with same function being applied to same column (GH#321) Update requirements (GH#328) Clean up DFS arguments (GH#319) Clean up Pandas Backend (GH#302) Update properties of cumulative transform primitives (GH#320) Feature stability between versions documentation (GH#316) Add download count to GitHub readme (GH#310) Fixed #297 update tests to check error strings (GH#303) Remove usage of fixtures in agg primitive tests (GH#325)
Resolve bug preventing using first column as index by default (GH#308)
Handle return type when creating features from Id variables (GH#318)
Make id an optional parameter of EntitySet constructor (GH#324)
Handle primitives with same function being applied to same column (GH#321)
Update requirements (GH#328)
Clean up DFS arguments (GH#319)
Clean up Pandas Backend (GH#302)
Update properties of cumulative transform primitives (GH#320)
Feature stability between versions documentation (GH#316)
Add download count to GitHub readme (GH#310)
Fixed #297 update tests to check error strings (GH#303)
Remove usage of fixtures in agg primitive tests (GH#325)
Remove ft.utils.gen_utils.getsize and make pympler a test requirement (GH#299) Update requirements.txt (GH#298) Refactor EntitySet.find_path(…) (GH#295) Clean up unused methods (GH#293) Remove unused parents property of Entity (GH#283) Removed relationships parameter (GH#284) Improve time index validation (GH#285) Encode features with “unknown” class in categorical (GH#287) Allow where clauses on direct features in Deep Feature Synthesis (GH#279) Change to fullargsspec (GH#288) Parallel verbose fixes (GH#282) Update tests for python 3.7 (GH#277) Check duplicate rows cutoff times (GH#276) Load retail demo data using compressed file (GH#271)
Remove ft.utils.gen_utils.getsize and make pympler a test requirement (GH#299)
Update requirements.txt (GH#298)
Refactor EntitySet.find_path(…) (GH#295)
Clean up unused methods (GH#293)
Remove unused parents property of Entity (GH#283)
Removed relationships parameter (GH#284)
Improve time index validation (GH#285)
Encode features with “unknown” class in categorical (GH#287)
Allow where clauses on direct features in Deep Feature Synthesis (GH#279)
Change to fullargsspec (GH#288)
Parallel verbose fixes (GH#282)
Update tests for python 3.7 (GH#277)
Check duplicate rows cutoff times (GH#276)
Load retail demo data using compressed file (GH#271)
Handling time rewrite (GH#245) Update deep_feature_synthesis.py (GH#249) Handling return type when creating features from DatetimeTimeIndex (GH#266) Update retail.py (GH#259) Improve Consistency of Transform Primitives (GH#236) Update demo docstrings (GH#268) Handle non-string column names (GH#255) Clean up merging of aggregation primitives (GH#250) Add tests for Entity methods (GH#262) Handle no child data when calculating aggregation features with multiple arguments (GH#264) Add is_string utils function (GH#260) Update python versions to match docker container (GH#261) Handle where clause when no child data (GH#258) No longer cache demo csvs, remove config file (GH#257) Avoid stacking “expanding” primitives (GH#238) Use randomly generated names in retail csv (GH#233) Update README.md (GH#243)
Handling time rewrite (GH#245)
Update deep_feature_synthesis.py (GH#249)
Handling return type when creating features from DatetimeTimeIndex (GH#266)
Update retail.py (GH#259)
Improve Consistency of Transform Primitives (GH#236)
Update demo docstrings (GH#268)
Handle non-string column names (GH#255)
Clean up merging of aggregation primitives (GH#250)
Add tests for Entity methods (GH#262)
Handle no child data when calculating aggregation features with multiple arguments (GH#264)
Add is_string utils function (GH#260)
Update python versions to match docker container (GH#261)
Handle where clause when no child data (GH#258)
No longer cache demo csvs, remove config file (GH#257)
Avoid stacking “expanding” primitives (GH#238)
Use randomly generated names in retail csv (GH#233)
Update README.md (GH#243)
Improve performance of all feature calculations (GH#224) Update agg primitives to use more efficient functions (GH#215) Optimize metadata calculation (GH#229) More robust handling when no data at a cutoff time (GH#234) Workaround categorical merge (GH#231) Switch which CSV is associated with which variable (GH#228) Remove unused kwargs from query_by_values, filter_and_sort (GH#225) Remove convert_links_to_integers (GH#219) Add conda install instructions (GH#223, GH#227) Add example of using Dask to parallelize to docs (GH#221)
Improve performance of all feature calculations (GH#224)
Update agg primitives to use more efficient functions (GH#215)
Optimize metadata calculation (GH#229)
More robust handling when no data at a cutoff time (GH#234)
Workaround categorical merge (GH#231)
Switch which CSV is associated with which variable (GH#228)
Remove unused kwargs from query_by_values, filter_and_sort (GH#225)
Remove convert_links_to_integers (GH#219)
Add conda install instructions (GH#223, GH#227)
Add example of using Dask to parallelize to docs (GH#221)
Remove unnecessary check no related instances call and refactor (GH#209) Improve memory usage through support for pandas categorical types (GH#196) Bump minimum pandas version from 0.20.3 to 0.23.0 (GH#216) Better parallel memory warnings (GH#208, GH#214) Update demo datasets (GH#187, GH#201, GH#207) Make primitive lookup case insensitive (GH#213) Use capital name (GH#211) Set class name for Min (GH#206) Remove variable_types from normalize entity (GH#205) Handle parquet serialization with last time index (GH#204) Reset index of cutoff times in calculate feature matrix (GH#198) Check argument types for .normalize_entity (GH#195) Type checking ignore entities. (GH#193)
Remove unnecessary check no related instances call and refactor (GH#209)
Improve memory usage through support for pandas categorical types (GH#196)
Bump minimum pandas version from 0.20.3 to 0.23.0 (GH#216)
Better parallel memory warnings (GH#208, GH#214)
Update demo datasets (GH#187, GH#201, GH#207)
Make primitive lookup case insensitive (GH#213)
Use capital name (GH#211)
Set class name for Min (GH#206)
Remove variable_types from normalize entity (GH#205)
Handle parquet serialization with last time index (GH#204)
Reset index of cutoff times in calculate feature matrix (GH#198)
Check argument types for .normalize_entity (GH#195)
Type checking ignore entities. (GH#193)
Cpu count fix (GH#176) Update flight (GH#175) Move feature matrix calculation helper functions to separate file (GH#177)
Cpu count fix (GH#176)
Update flight (GH#175)
Move feature matrix calculation helper functions to separate file (GH#177)
Multiprocessing (GH#170) Handle unicode encoding in repr throughout Featuretools (GH#161) Clean up EntitySet class (GH#145) Add support for building and uploading conda package (GH#167) Parquet serialization (GH#152) Remove variable stats (GH#171) Make sure index variable comes first (GH#168) No last time index update on normalize (GH#169) Remove list of times as on option for cutoff_time in calculate_feature_matrix (GH#165) Config does error checking to see if it can write to disk (GH#162)
Multiprocessing (GH#170)
Handle unicode encoding in repr throughout Featuretools (GH#161)
Clean up EntitySet class (GH#145)
Add support for building and uploading conda package (GH#167)
Parquet serialization (GH#152)
Remove variable stats (GH#171)
Make sure index variable comes first (GH#168)
No last time index update on normalize (GH#169)
Remove list of times as on option for cutoff_time in calculate_feature_matrix (GH#165)
Config does error checking to see if it can write to disk (GH#162)
Support Pandas 0.23.0 (GH#153, GH#154, GH#155, GH#159) No EntitySet required in loading/saving features (GH#141) Use s3 demo csv with better column names (GH#139) more reasonable start parameter (GH#149) add issue template (GH#133) Improve tests (GH#136, GH#137, GH#144, GH#147) Remove unused functions (GH#140, GH#143, GH#146) Update documentation after recent changes / removals (GH#157) Rename demo retail csv file (GH#148) Add names for binary (GH#142) EntitySet repr to use get_name rather than id (GH#134) Ensure config dir is writable (GH#135)
Support Pandas 0.23.0 (GH#153, GH#154, GH#155, GH#159)
No EntitySet required in loading/saving features (GH#141)
Use s3 demo csv with better column names (GH#139)
more reasonable start parameter (GH#149)
add issue template (GH#133)
Improve tests (GH#136, GH#137, GH#144, GH#147)
Remove unused functions (GH#140, GH#143, GH#146)
Update documentation after recent changes / removals (GH#157)
Rename demo retail csv file (GH#148)
Add names for binary (GH#142)
EntitySet repr to use get_name rather than id (GH#134)
Ensure config dir is writable (GH#135)
Primitives as strings in DFS parameters (GH#129) Integer time index bugfixes (GH#128) Add make_temporal_cutoffs utility function (GH#126) Show all entities, switch shape display to row/col (GH#124) Improved chunking when calculating feature matrices (GH#121) fixed num characters nan fix (GH#118) modify ignore_variables docstring (GH#117)
Primitives as strings in DFS parameters (GH#129)
Integer time index bugfixes (GH#128)
Add make_temporal_cutoffs utility function (GH#126)
Show all entities, switch shape display to row/col (GH#124)
Improved chunking when calculating feature matrices (GH#121)
fixed num characters nan fix (GH#118)
modify ignore_variables docstring (GH#117)
More descriptive DFS progress bar (GH#69) Convert text variable to string before NumWords (GH#106) EntitySet.concat() reindexes relationships (GH#96) Keep non-feature columns when encoding feature matrix (GH#111) Uses full entity update for dependencies of uses_full_entity features (GH#110) Update column names in retail demo (GH#104) Handle Transform features that need access to all values of entity (GH#91)
More descriptive DFS progress bar (GH#69)
Convert text variable to string before NumWords (GH#106)
EntitySet.concat() reindexes relationships (GH#96)
Keep non-feature columns when encoding feature matrix (GH#111)
Uses full entity update for dependencies of uses_full_entity features (GH#110)
Update column names in retail demo (GH#104)
Handle Transform features that need access to all values of entity (GH#91)
fixes related instances bug (GH#97) Adding non-feature columns to calculated feature matrix (GH#78) Relax numpy version req (GH#82) Remove entity_from_csv, tests, and lint (GH#71)
fixes related instances bug (GH#97)
Adding non-feature columns to calculated feature matrix (GH#78)
Relax numpy version req (GH#82)
Remove entity_from_csv, tests, and lint (GH#71)
LatLong type (GH#57) Last time index fixes (GH#70) Make median agg primitives ignore nans by default (GH#61) Remove Python 3.4 support (GH#64) Change normalize_entity to update secondary_time_index (GH#59) Unpin requirements (GH#53) associative -> commutative (GH#56) Add Words and Chars primitives (GH#51)
LatLong type (GH#57)
Last time index fixes (GH#70)
Make median agg primitives ignore nans by default (GH#61)
Remove Python 3.4 support (GH#64)
Change normalize_entity to update secondary_time_index (GH#59)
Unpin requirements (GH#53)
associative -> commutative (GH#56)
Add Words and Chars primitives (GH#51)
fix EntitySet.combine_variables and standardize encode_features (GH#47) Python 3 compatibility (GH#16)
fix EntitySet.combine_variables and standardize encode_features (GH#47)
Python 3 compatibility (GH#16)
Fix variable type in demo data (GH#37) Custom primitive kwarg fix (GH#38) Changed order and text of arguments in make_trans_primitive docstring (GH#42)
Fix variable type in demo data (GH#37)
Custom primitive kwarg fix (GH#38)
Changed order and text of arguments in make_trans_primitive docstring (GH#42)
Last time index (GH#33) Update Scipy version to 1.0.0 (GH#31)
Last time index (GH#33)
Update Scipy version to 1.0.0 (GH#31)
Add MANIFEST.in (GH#26)
Package linting (GH#7) Custom primitive creation functions (GH#13) Split requirements to separate files and pin to latest versions (GH#15) Select low information features (GH#18) Fix docs typos (GH#19) Fixed Diff primitive for rare nan case (GH#21) added some mising doc strings (GH#23) Trend fix (GH#22) Remove as_dir=False option from EntitySet.to_pickle() (GH#20) Entity Normalization Preserves Types of Copy & Additional Variables (GH#25)
Package linting (GH#7)
Custom primitive creation functions (GH#13)
Split requirements to separate files and pin to latest versions (GH#15)
Select low information features (GH#18)
Fix docs typos (GH#19)
Fixed Diff primitive for rare nan case (GH#21)
added some mising doc strings (GH#23)
Trend fix (GH#22)
Remove as_dir=False option from EntitySet.to_pickle() (GH#20)
Entity Normalization Preserves Types of Copy & Additional Variables (GH#25)
NumTrue primitive added and docstring of other primitives updated (GH#11) fixed hash issue with same base features (GH#8) Head fix (GH#9) Fix training window (GH#10) Add associative attribute to primitives (GH#3) Add status badges, fix license in setup.py (GH#1) fixed head printout and flight demo index (GH#2)
NumTrue primitive added and docstring of other primitives updated (GH#11)
fixed hash issue with same base features (GH#8)
Head fix (GH#9)
Fix training window (GH#10)
Add associative attribute to primitives (GH#3)
Add status badges, fix license in setup.py (GH#1)
fixed head printout and flight demo index (GH#2)
Documentation improvements New featuretools.demo.load_mock_customer function
Documentation improvements
New featuretools.demo.load_mock_customer function
featuretools.demo.load_mock_customer
Bug fixes Added Percentile transform primitive
Bug fixes
Added Percentile transform primitive
Percentile
Performance improvements for approximate in calculate_feature_matrix and dfs Added Week transform primitive
Performance improvements for approximate in calculate_feature_matrix and dfs
Added Week transform primitive
Week
Added load_features and save_features to persist and reload features Added save_progress argument to calculate_feature_matrix Added approximate parameter to calculate_feature_matrix and dfs Added load_flight to ft.demo
Added load_features and save_features to persist and reload features
load_features
save_features
Added save_progress argument to calculate_feature_matrix
Added approximate parameter to calculate_feature_matrix and dfs
Added load_flight to ft.demo
load_flight
Windows support
Renamed feature submodule to primitives Renamed prediction_entity arguments to target_entity Added training_window parameter to calculate_feature_matrix
Renamed feature submodule to primitives
Renamed prediction_entity arguments to target_entity
Added training_window parameter to calculate_feature_matrix
Initial release