The FunctionActor
is an actor that evaluates an expression. It can use fields from the incoming JSON object and it can collect data from other actors. The function is evaluated to either a boolean, a string, or a numeric value, depending on the expression that was entered by the user.
The FunctionActor
has "type": "function"
. The params
value is a JSON object with the following fields:
field | type | required | description |
---|---|---|---|
function |
string | yes | The expression that should be evaluated. |
outputfield |
string | yes | The name of the output field to where the result should be written. |
Now, when the following JSON comes in:
The FunctionActor will return
Because x
is not larger than 10.
The FunctionActor can handle (simple and compounded) boolean expressions, string concatenations and numeric expressions. Below, a list of example expressions is provided:
Simple boolean expressions are supported:
Compounded boolean expressions are also supported:
Parentheses are also supported:
In these examples, x
, y
and z
refer to field names in the incoming JSON object. If any of these fields cannot be found, nothing will be emitted.
Nested fields and array fields can also be accessed, as follows:
if the incoming JSON object looks as follows:
This expression will evaluate to true
.
It is possible to access an array element with the following notation:
Which will access the value 42 in the following incoming JSON object:
Array indices are 0-based.
The examples above are all examples of boolean expressions which evaluate to either true
or false
. It is also possible to evaluate to a numeric value, like in the following expression:
The FunctionActor supports addition, subtraction, multiplication and division.
String concatenation is supported in the following way:
If the incoming JSON is
The result will be
In the future, string functions such as replace, substring and indexOf could also be supported.
Not only fields in the incoming JSON object can be looked up, but also fields from other actors through a collect procedure. A collect definition should be provided in the constructor of the FunctionActor as follows:
An expression that uses both collect variables and JSON variables looks as follows:
When the following JSON comes in:
and stats
returns 20 for myvariable
and stats3
returns 5 for variable2
, the result will be the following (assuming the same constructor as specified above):
See Expression Language Syntax for the full syntax of the expression language.
When the TabulateActor receives this JSON object, it will check if the cell with coordinates ("somevalue", "othervalue")
already exists, and if it does, it executes the aggregation functions by combining the old value currently in the cell with the value 140. If it does not exist yet, it will create a new cell and it will then execute the aggregation functions.
The aggregation functions that can be calculated for each unique combination of dimension values are the following:
function | required | description |
---|---|---|
average |
no | The average of a numeric field |
count |
no | The number of received events |
sum |
no | The sum of a numeric field |
min |
no | The minimum of a certain numeric field until now |
max |
no | The maximum of a certain numeric field until now |
At least one of these functions must be provided in the constructor of the actor. The count
function works on all types of JSON fields, while the other aggregation functions only work on numeric JSON fields. For the count and sum aggregation functions, data in a cell can be normalized by row, column or the total of the entire table.
The TabulateActor reads the value of the field with the name specified in value
(“field1” in the example), if it exists. If it does not exist, the actor does nothing. If the value can be found, all specified aggregation functions are executed for each unique value of all dimension fields.
There is no limitation on the data type of the dimension fields, even numeric values can be used. However, numeric values are rounded to 4 decimals before being used as dimension values.
The normalization function is only executed when the state is collected by another actor and when the aggregation function is count
or sum
. Internally, the TabulateActor will not keep the normalized values but it will only calculate them on request.
The TabulateActor can be configured to pass through the original input data with the passthrough
parameter. If passthrough
is true, the actor emits the original input event. If it is set to false, it will not emit anything.
The TabulateActor keeps the table as state.
On a collect request of the field table
, the actor will return the following:
The TabulateActor implements the following state functions:
Since the TabulateActor is a stateful actor, the actor does nothing when all of the state functions are turned off. The TabulateActor does not implement the collect-local procedure.
The amount of memory that this actor consumes is proportional to the size of the table. If this is undesirable behavior, an alternative is to connect a FunctionActor, a GroupByActor and a StatsActor in sequence. The FunctionActor should then concatenate all dimension values in a single string, which will be the field that the GroupByActor groups on. The GroupByActor then creates a StatsActor for each unique combination of dimension fields. The resulting behavior is the same as the TabulateActor
except that state is now split over multiple actors.
The advantage of a TabulateActor
over this alternative setup is that it is faster to look up data in a single actor than it is to look up data in multiple actors. Also, it is easier to handle (manually submit, save, and restore) a persisted snapshot from Cassandra for a TabulateActor than to handle snapshots for multiple actors.
The TabulateActor
does not collect state from other actors.
The syntax for the expression actor is as follows. A :=
means “is a”, the term before the :=
is any one of the alternatives after the :=
. A |
means a parsing alternative, it can either match the term before the |
or after the |
. A *
means that the term between parentheses can be repeated any number of times. A ~
means “and”: the term consists of the part before the ~
and the part after the ~
. A term enclosed with "
means a literal: it matches exactly the string between the two "
s.