!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

An experimental protocol for constraint-based configurators

Product Descriptions for CSP Problems
What can we do on product descriptions (the protocols) ?
Experimental Results

Product Descriptions for CSP Problems

What is in these Files?

They are examples of product descriptions in Aralia format. There are two of them:

big.txt,
medium.txt

hereafter colloquially called Big and Medium, respectively. They were reworked from actual vehicle ranges; by changing all variable names we effectively removed the actual semantics of the products. However it may help to know Medium was a small urban car, so you can regard it a sandbox example. Big was a utility van with serious product variability, and is the example to test your solvers & algorithms with.

We have also translated the xml CSP format:

What is the Aralia Format?

Meaning: in general? I have no idea. However the files use only a tiny subset of the format, and obey a lot of additional conventions not part of the format. Wherever whitespace is legal, there may be a language C-style comment, bracketed between /* and */. There are 2 parts in a file; the first contains lines like:

#(1,1,[v1.0, v1.1, v1.2, v1.3]);

which means "exactly one of Boolean variables v1.0, v1.1, v1.2, v1.3 is true", or

#(0,1,[v37.0, v37.1]);

which means "at most one of Boolean variables v37.0, v37.1 is true". The second part of a file consists of fully parenthesized, Boolean formulas with operators "-" (negation), "&" (conjunction), "|" (disjunction), "=>" (entailment), and using semi-colons ";" as line terminators. Everything else, i. e. the strings of characters you find between operators, are (occurrences of) (Boolean) variables.

That's all!

At least concerning the format itself. Additional syntactic constraints the files adhere to are:

every formula fits in one line,
the # lines are grouped together in the beginning of the file, then all the Boolean formulas,
every Boolean variable occurs exactly once in one of the # lines, then as many times as needed in the 2nd part of the file (the Boolean formulas),
for a given N, there is exactly one # where all the variables vN.<something> are found,
the file contains first # lines, each with P variables vN.i, 0 <= i <= P-1; then lines of the form #(0,1,[vN.i.Serie]); then similar lines where ".Serie" is replaced with ".Pack", ".Option", ".OptionPack".

What do they mean?

Each of the files represents a product description; a valid product is the assignment of a value"true" or "false", to each Boolean variable mentioned in the file, such that all the Boolean formulas are true.

In the original description, the variability of the product was represented by a finite set of discrete variables, each taking on values in a finite range of possible values. Back then, a valid product was an assignment of a value to each discrete variable (in its allowed range, of course). Since many solvers only handle binary variables, this description had to be adapted somehow; hence the 2 parts of the files: each "#" line actually describes a discrete variable by listing its possible values. The line

#(1,1,[v1.0, v1.1, v1.2, v1.3]);

(see above) effectively means: "there exists a discrete variable v1... with 4 possible values.0, 1, 2, 3, exactly one of which must be selected in a completely specified product". Similarly, the line

#(0,1,[v37.0, v37.1]);

means: "there exists a discrete variable v37... with 3 possible values 0, 1 and NotApplicable, exactly one of which must be selected in a completely specified product; furthermore, v37.NotApplicable never occurs in the Boolean formulas forming the 2nd part of the file".

This is a pretty generic method for specifiying variability, however the products in our examples are vehicles, and most car manufacturers organize the ranges of their models according to a "version/option" paradigm: a version is a label for a particular architecture, and options are a client's choices within a particular version. Versions play no particular role in the configuration task, in particular the customer is not required to commit to a version before he chooses options; however it may be useful for some solver heuristics to know that variable v0... is actually a version identifier, i. e. v0.0, v0.1, etc., label versions.

It is also an easy check for an Aralia-to-your-favorite-format converter that all Boolean variables ending in ".Serie", ".Pack", ".Option", ".OptionPack", are in fact flags attached to the version. That is, once you have asserted one of the v0.i variables (assigned "true" to it, which entails that all of the others v0.i' are negated), the values of all the .Serie, .Pack,... variables are de facto fixed. In particular, they could be removed from the product definition without loss of information, since each of them is logically equivalent to a disjunction of v0.i's. It may also help to know that, if there exists a Boolean variable vN.P.Pack, then vN.P represents a pack, that is, a combination of options sold together, and so are the vN.j for j<>P.

Pricing information

Each of the product descriptions has an associated file giving pricing information:

You guessed it: big_price is for product Big and medium_price is for product Medium. Both files have the same structure, which uses the basic Aralia format (see above) with a very slight twist. A pricing file consists of lines, and each line contains a Boolean formula in the same Boolean variables as were defined in the corresponding product description, followed by a semi-colon ";". Following the semi-colon is a number, whereas the Aralia grammar would allow any string there as a comment: that's the twist.

The semantics of the whole thing is: only completely defined products have a price, which is the total of those numbers whose associated formulas are satisfied by the product definition.

We have also translated the pricing information in the xml CSP format (the price of a configuration is the sum of the valuations provided by the soft constraints)

What can we do on product descriptions (the protocols) ?

You can test solvers of course; after all, this why we built these files in the first place. In particular, we defined 3 standard use cases: conflict generation, configuration, projection, on which it makes sense to compare solvers and algorithms. All 3 consist in adding temporary constraints, then drawing inferences from this augmented set.

The Full Configuration Protocol

BT : Basic tests

As a basic test for a format converter, you may count the total number of variables.

As a basic test for the correctness of your format converter if any, of your solver and/or your solving strategy, you may count the number of solutions (valid, complete assignments under the constraints of the file).

medium file: 148 variables, 278,744 solutions;
big file: 268 variables, 24,566,537,954,855,761,920,000 solutions;
medium.xml : 148 variables, 278,744 solutions;
big.xml : 268 variables, 24,566,537,954,855,761,920,000 solutions;

Then, the use cases themselves are defined as follows:

FCP: The Full Configuration Protocol

The procedure simulates (a part of) a session of interractive configuration, where the user assigns a configuration variable according to her preferences. After each choice, the minimal price, among all possible extensions of the current configuration, is computed, and the domains of the remaining variables are pruned so as to remove all the values that cannot lead to a solution (given the previous choices): in other words the domains are made "GIC-consistent". When the domain of the next variable does not contain the value the user wanted, the user unassigns variables (in a order that may be different from the variable order chosen during the assignment phase, until the desired value is allowed again. Gic and Min price are computed upon each unassignment.

Price compuration steps (B.2 and C.2) can be skipped for an classical unpriced configuration protocol. We call "classical configuration - FCP-U" the unpriced version ansd "full priced configuration - FCP-P" the priced one.

We propose two versions of this protocol: a statistical one (variables and values are randomly chosen) and scenarised one (it uses a set of fixed scenarii specifying what assignments and unassignments take place, and in what order)

Statistical version :

A. Get the original domains by ensuring the global inverse consistency on the domains

 B. Assigment phase:
 Pick an unassigned configuration variable v N (with currend domain of size > 1) with uniform probability
 Pick one of the values (one of the vN.P's) at random in its original domain, with uniform probabilities,
 If this value does not belongs to the current domain of the variable go to C.
 Otherwise:
 B.1 Assert it, i. e.:
 * Aralia Version: add to the set of Boolean constraints that the chosen Boolean variable is assigned "true".
 * Xcsp Version: assign value vN.P to variable vN
 B.2 Compute the minimal and maximal prices of the complete product consistent with the current configuration.
 B.3 Ensure the global inverse consistency on the domains:
 * Aralia Version: determine for all of the Boolean variables whether they are always true, always false, or possible i.e. true (resp. false) in at least one valid assignment;
 * XCSP Version: determine for each variable and each value in its domain whether it belongs to at least one valid assignment;
 B.4: If B.3 detects an inconsistency, there is a bug in your program: GIC was not correctly ensured after each choice.

 C. Value Restoration
 C.1 Pick one of the previously assigned variables at random with a uniform probability, and unassign it;
 C.2 Compute the minimaland maximal prices of the extensions consistent with the configuration after unassignement;
 C.3 Ensure the global consistency of the domains;
 C.4 If the desired value reappears in the current domain of its variable then termiinate, else go to (C.1)

Steps B.2 and C.2 can be skipped for an classical unpriced configuration protocol. We call "classical configuration" the unpriced version ansd "full priced configuration" the priced one.

Although it is theoretically possible to produce a satisfiable set of constraints after assigning each of the variables without any conflict, i.e. without ever entering part C the probability is negligible in the case of both the big/medium files, and a conflict (an unsatisfiable set) is reached after about half a dozen assignments.

As a soundness test, the outcome of a run of phases A+B is the number of explicit assignments before the conflict, in other words, the number of variables explicitely assigned the value "true" before the resulting set of constraints became unsatisfiable; it is independent of the algorithm.

A second outcome of a run of phases A+B is the number of values removed from the domain. They are independent of the algorithm used.

The outcome of a run of phase C is the number of unassignements realized and the number of values reintroduced in the domains. They are independent of the algorithm used.

Phase C can be replaced by the computation of all the explanations of the removal of the targeted value (see Variants, below).

Scenarized version

To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here (xml,) .

 Init: Read an assignement sequence S, an unassignement sequence U, and a variable-value pair (vN, vN.P)

 A. Ensure global inverse consistency on the domains;

 B. Assigment phase: for each couple (vN, vN.P) in the assignement sequence S,
 B.1 Assert it, i. e.:
 B.2 Compute the minimal/maximal prices of the consistent extensions to the current configuration.
 B.3 Ensure the global inverse consistency on the domains.

 C. Value restoration: for each boolean variable vN in the unassignement sequence U
 C.1 Undo the assigment;
 C.2 Compute the minimal/maximal prices of the consistent extensions to the current configuration.
 C.3 Ensure the global consistency of the domains;
 C.4 If the desired value reappears in the current domain of its variable then termiinate, else go to (C.1)

 At the end of phase B (and not before) value vN.<k> should have disappeard from the current domain of vN., k being the length of the assignment sequence.
 At the end of phase C (and not before) , value vN.<k'> .O should reappear in the current domain of vN<k'>, k' being the lenght of the unassignment sequence

The outcome of a run of phases A+B is the (cumulated) number of values removed from the domains by GIC. It is independent of the algorithm used.

The outcome of a run of phase C is the (cumulated) number of values reintroduced in the domains. It is independent of the algorithm used.

Again, steps B.2 and C.2 can be skipped for a classical unpriced configuration protocol. We call "classical configuration - FCP-U" the unpriced version and "full priced configuration - FCP-P" the priced one.

Variants of the full configuration protocol

GC-P : Greedy (priced) Configuration

Greedy configuration is a simplification of the full configuration protocol ; it corresponds to phases A and B in Full configuration above. ; in this variant, the user choses her values in the current domains of the variables (we always choose assignments leading to a satisfiable set of constraints), and, the domains are assumed to be globaly coherent, so that the process never backtrack.

Statistical version :

The outcome of a run is the number of explicit assignments, that is, the number of times one of the vNs was asserted (in step 4. above), effectively restricting the product variability. It independent of the algorithm used.

Scenarized version

To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here GP-P-scenarios-small-minimal.xml, GP-P-scenarios-medium-minimal.xml and GP-P-scenarios-big-minimal.xml-

 Init: Read an assignement sequence S, an unassignement sequence U, and a variable-value pair (vN, vN.P)

 A. Ensure global inverse consistency on the domains;

 B. Assigment phase: for each couple (vN, vN.P) in the assignement sequence S,
 B.1 Assert it, i. e.:
 B.2 Compute the minimal/maximal prices of the consistent extensions to the current configuration.
 B.3 Ensure the global inverse consistency on the domains.


Note that the scenarii proposed are consistent (no inconsistency should be detected during the run), meaning full (tonlyariables with domain sizes greater than one after the GIC step may be selected for assignment) and minimal (after the last assignement, all the domains are of size one).

The outcome of a run of phases A+B is the (cumulated) number of values removed from the domains by GIC. It is independent of the algorithm used.

GC-U : Greedy (unpriced) Configuration

Greedy configuration is a simplification of the full configuration protocol and of the greedy priced protocol ; in this variant, the user choses her values in the current domains of the variables, the domains are assumed to be globaly coherent, so that the process never backtrack. The protocol is almost the same as the previous greedy priced protocol , except that prices are not maintained. This protocol can be a basis for further variants (below).

Statistical version:

Ensure the global inverse consistency on the domains:

Aralia Version: determine for all of the Boolean variables whether they are always true, always false, or possible i.e. true (resp. false) in at least one valid assignment.
XCSP Version: determine for each variable and each value in its domain wheter it belongs to at least one valid assignment.

Make the list of the discrete variables vN that have more that one value in their domain.
If this list is empty, terminate.
Otherwise, randomly pick one of the discrete variables vN among those at have more that one value in their current domains, and pick one of the values vN.p in the current domain of this variable, with uniform probabilities, and assert it, ie.

Aralia Version: add to the set of Boolean constraints that vN.p is assigned "true".
Xcsp Version: assign this value to vN.

Go to (1).

Scenarized version

To simulate a scenarized session of configuration, it is sufficent to run your program only on the consistent scenarii here ( here (aralia) and here GP-P-scenarios-small-minimal.xml, GP-P-scenarios-medium-minimal.xml and GP-P-scenarios-big-minimal.xml- the same as those used for GC-P, simply ignore the min and max price informations.

Read an sequence s in the file
For each couple (vN,vN.P) in the sequence

Assert it , i. e.

Aralia Version: add to the set of Boolean constraints that vN.P is assigned "true".
Xcsp Version: assign value vN.P to variable vN

Ensure the global inverse consistency on the domains

Aralia Version: determine for all of the Boolean variables whether they are always true, always false, or possible i.e. true (resp. false) in at least one valid assignment
XCSP Version: determine for each variable and each value in its domain wheter in belong to at least one valid assignment

As in the statistical version, the outcome of a run is the (cumulated) number of values removed from the domains. It is independent of the algorithm used.

Note that the scenarii proposed are consistent (no inconsistency should be detected during the run), meaning full (tonlyariables with domain sizes greater than one after the GIC step may be selected for assignment) and minimal (after the last assignement, all the domains are of size one).

GC-P: Greedy configuration at minimal price

The procedure simulates an interactive configuration where, for each possible value of the next variable, the user is shown the minimal price over the extension to the current configuration that assign this value to the variable ; the user then selects one of the values for which the price is the smallest.

Statistical version:

Ensure the global inverse consistency on the domains,
Make the list of the discrete variables vN that have more that one value in their domains.
If this list is empty, terminate.
Otherwise, pick an unassigned configuration variable v N (with current domain of size > 1) with uniform probability
- For each value in its domain, compute the minimal price of the extensions to the current configuration that assign this value to the variable
- Select one of the values for which this minimal price is smallest, at random with uniform probabilities.
Go to (1).

Scenarized version

To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here (xml, coming soon)

Ensure the global inverse consistency on the domains.
Read a sequence s in the file.
For each pair (vN,vN.P) in the sequence,

For each possible value of variable vN, compute the minimal price of the extensions to the current configuration that assign this value to the variable,
Check that vN.P is one of the least expensive values and assert it,
Ensure the global inverse consistency on the domains.

Note that the scenarii proposed are consistent (no inconsistency should be detected during the run), meaning full (the assigments deal only with variables having a domain sized greater than one after the GIC step and the value proposed by the scenario is the cheapest one) and minimal (after the last assignment, all the domains are of size one).

As a way to test your program, the outcome of this experiment is the average price proposed for the next variable.

Gic-Approx: Approximating global inverse consistency

A level of local consistency (e.g. arc consistency) can be used used to approximate GIC. The previous protocols ca nbe used to test the quality of the approximation.

Greedy protocol. In the above greedy protocol, when GIC is approximed by a local consistency, the number of values removed from the domains (resp. the number of explicit assignments) may be smaller (resp. larger), and may depend on the level of local consistency used. The comparison of these numbers with those under GIC measures the quality of the level of local consistency as an approximation of GIC. Moreover, the procedure can lead to a dead end (whereas it always reaches a valid configuration when GIC is ensured) ; the number of dead ends encountered also measures the risk of using local consistencies rather than GIC.

Full configuration protocol. In this protocol too, the (cumulated) number of values removed from the domain may be lower than the one provided by GIC in steps A+B, and the number of values reintroduced in the domains in step C may be higher than the one provided by GIC. Likewise, the number of assignments before inconsistency is detected may be larger and may depend on the level of local consistency used. The comparison of these three measures with to the ones obtained by GIC estimates the quality of the level of local consistency as an approximation of GIC.

FC-Alt, GC-Alt: Alternative Values

As already noticed, any level of local consistency can be used instead of Global Inverse Consistency in the full config and greedy config protocols. The propagation may also include the computation of alternative values. Use the unpriced variant to focus on alternative values, and give the level of local consistency used as an index (FC-Alt_gic, FC-Alt_ac, etc)

CG: Conflict generation

We call "conflict generation" the protocol consisting in realizing only steps A and B in the classical (unpriced) configuration protocol above.

GC-expl, GC-res: Configuration with explanations / restorations

In order to test explanations, run steps A and B in full configuration protocol without prices (i.e. conflict generation), then replace that C by the computation of :

* all the explanations of the removal of vN.P from the current domain of vN.. I.e. the minimal sets of user choices that are inconsistent with the assignment of this value to this variable.

* all the possible restorations for vN.P in the current domain of vN.. I.e. the maximal sets of user choices that are consistent with the assignment of thisvalue to this variable.

The outcome of this experiment is the number of explanations (resp. restorations) and their average size.

Projection

This can be regarded as an exhaustive list of assigments satisfying all the constraints, restricted to a limited number n of configuration variables (Option, Pack and Series variables are excluded)

For a specified n, pick a subset of n variables VN among the vN...'s, at random with uniform probability (or read it in the following scenario file)
Compute the projection of the valid configuration on thesevariables:

Aralia Version: Determine which of the assignments vN.j=true are valid under the Boolean constraints specified in the Aralia file.
XCSP Version: Determine which of the assignments of the vN are valid under the constraints specified in the XCSP file.

As in the statistical version, the outcome of a run is the number of (partial) assignments in the projection computed and the number of values tested (i.e. the average number of possible values for thevariables under concern). They are independent of the algorithm used.

Experimental Results

As these are performance tests, you would likely try a fair number of runs. Here are our results for batches of 100, 1,000 and 10,000 consecutive runs respectively, with the average value of the characteristic outcome (see above, the descriptions of the use cases) in addition to the total running time.

The runs were on a on an AMD Athlon II, dual-core B22 under Windows 7; keep in mind however, our solver engine is in Java so our setup is essentially 1-core. The JVM was a Win32 exe running in WoW64 (in less pedantic words, a 32 bit exe), and our running times include the JIT compilation overhead. On the other hand, they do not include a prologue to the batch of runs proper, during which the file Big.txt was compiled to a more convenient form. This took about 5 sec, including file reading + Aralia parsing. The resulting structure is very, very roughly 1MB large.

Test Outcome 10,000 runs 1,000 runs 100 runs

Full Configuration statistical

FCP-P-rand
# of explicit assignments before a conflict 5.07
5.12
5.17

# (cumulated) of values removed from the domains

average minimal price at the backtrack point
# number of unassignements

# (cumulated) of values restored in the domains

Total running time
1251.052s 123.492s 12.698s

Running time step A (avg. per scen in seconds / max)
Running time step B2+B3 (avg. per scen in seconds / max)

Running time step C2+C3 (avg. per scen in seconds / max)

Full (unpriced) configuration, statistical

FCP-U-rand

# of explicit assignments before a conflict

# (cumulated) of values removed from the domains
# number of unassignements
# (cumulated) of values restored in the domains
Running time step A (avg. per scen in seconds / max)
Running time step B2 (avg. per scen in seconds / max)

Running time steps C2 (avg. per scen in seconds / max)
Total running time
Conflict Generation statistical

CG-rand

# of assigments before conflict, avg. 5.07 5.12 5.17

# (cumulated) of values removed

running time (avg., in seconds) 0.019 0.019 0.019

Greedy (unpriced) Configuration, statistical

GC-U-rand

# (cumulated) number of values removed from the domains

# of explicit assignments, avg. 16.10 15.59 15.20

Total running time

Greedy configuration at minimal price

GC-P-rand

Average minimal price proposed for the next variable

Total running time 1078.951s 107.47s 10.64s
Projection, 8 vars.
(proj8)
# of values tested, avg. 37.533 42.90 47.95

# of partial assignments, avg. 198.61 239.25 252.08

running time (avg., in seconds) 0.0164 0.022 0.024

Projection, 7 vars.
(proj7)
# of values tested, avg. 34.39 35.41 29.39

# of partial assignments, avg. 134.92 136.24 86.01

running time (avg., in seconds) 0.0124 0.0128 0.0007

Projection, 6 vars.
(proj6)
# of values tested, avg. 28.12 25.70 17.39

# of partial assignments, avg. 86.15 76.88 42.67

running time (avg., in seconds) 0.0084 0.00695 0.0031

First Release 2013-01-18.

Latest Release 2013-11-06.