Looks interesting although I miss a short introduction or how-to guide. I found that one can "Create and train machine learning models to predict market values." In this context, a related project is Intelligent Trading Bot: https://github.com/asavinov/intelligent-trading-bot which is intended for generating trading signals based on ML and feature engineering
Having Python expressions within a declarative language is a really good idea because we can combine low level logic of computations of values with high level logic of set processing.
A similar approach is implemented in the Prosto data processing toolkit:
In this Column-SQL statement we define a new calculated column the values of which are computed in Python (as a sum of two columns). An advantage is that we can process data in multiple tables without joins or groupbys which is much easier than in the existing set-oriented approaches. Another advantage is that we can combine many statements by defining a workflow in an Excel-like manner.
Not exactly. Timing is the issue when it comes to mixed type of data sources. try to load data via the URL (e.g. the results of API invocation or rss feed), then merge it with the data from postgres sql table or data from CSV file. It becomes challenging with the listed tools. Whereas TABLUM.IO solves the issue and make it organic and fast.
Most of the self-service or no-code BI, ETL, data wrangling tools are am aware of (like airtable, fieldbook, rowshare, Power BI etc.) were thought of as a replacement for Excel: working with tables should be as easily as working with spreadsheets. This problem can be solved when defining columns within one table:
Different systems provide different answers to this question but all of them are highly specific and rather limited.
Why it is difficult to define new columns in terms of other columns in other tables? Short answer is that working with columns is not the relational approach. The relational model is working with sets (rows of tables) and not with columns.
One generic approach to working with columns in multiple tables is provided in the concept-oriented model of data which treats mathematical functions as first-class elements of the model. Previously it was implemented in a data wrangling tool called Data Commander. But them I decided to implement this model in the Prosto data processing toolkit which is an alternative to map-reduce and SQL:
It defines data transformations as operations with columns in multiple tables. Since we use mathematical functions, no joins and no groupby operations are needed and this significantly simplifies and makes more natural the task of data transformations.
Moreover, now it provides Column-SQL which makes it even easier to define new columns in terms of other columns:
However, Prosto allows for data processing via column operations in many tables (implemented as pandas data frames) by providing a column-oriented equivalents for joins and groupby (hence it has no joins and no groupbys which are known to be quite difficult and require high expertise).
Prosto also provides Column-SQL which might be simpler and more natural in many use cases.
The whole approach is based on the concept-oriented model of data which makes functions first-class elements of the model as opposed to having only sets in the relational model.
> I think SQL is irritatingly non-composable, many operations require gymnastics to express
One approach to radically simplify operations with data is to use mathematical functions (in addition to mathematical sets) which is implemented in Prosto data processing toolkit [0] and (new) Column-SQL [1].
One alternative to SQL (type of thinking) is Column-SQL [1] which is based on a new data model. This model is relies on two equal constructs: sets (tables) and functions (columns). It is opposed to the relational algebra which is based on only sets and set operations. One benefit of Column-SQL is that it does not use joins and group-by for connectivity and aggregation, respectively, which are known to be quite difficult to understand and error prone in use. Instead, many typical data processing patterns are implemented by defining new columns: link columns instead of join, and aggregate columns instead of group-by.
More details about "Why functions and column-orientation" (as opposed to sets) can be found in [2]. Shortly, problems with set-orientation and SQL are because producing sets is not what we frequently need - we need new columns and not new table. And hence applying set operations is a kind of workaround due the absence of column operations.
This approach is implemented in the Prosto data processing toolkit [0] and Column-SQL[1] is a syntactic way to define its operations.
Yet, here the focus is on feature engineering and rethinking how it can be combined with traditional ML. Essentially, the point is that there no big differences and it is more natural and simpler to think of them as special cases of the same concept: features can be learned and ML models are frequently are used for producing intermediate results.
The main motivation is that the conventional approaches to data processing are based on manipulating mathematical sets for all kinds of use cases: we produce a new set if we want to calculate a new attribute, we produce a new set if want to match data from different tables, we get a new set if we aggregate data. Yet, we actually do not need to produce new sets (table, collections etc.) in many cases - it is enough to add a new column to an existing set. Here are more details about the motivation:
Column is an implementation of a function (similarly to how table is an implementations of a set). Theoretically, this approach leads to a data model based on two core elements: mathematical functions (new) and mathematical sets (old).
This approach was implemented in Prosto which is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby.
> I always felt like there was some super deep & fundamental link between these mathematical concepts and relational modeling ideas.
The relational model is relies on set theory (more specifically relational algebra). An alternative view on data and data modeling is based on 1) sets, and 2) functions, and is called the concept-oriented model [1, 2]. It is actually quite similar to category theory and maybe even could be described in terms of category theory. It is also quite useful for data processing and there is one possible implementation which is an alternative to map-reduce and join-groupby approaches [3].
[3] https://github.com/asavinov/prosto Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby