!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
An experimental protocol for constraint-based configurators
- Product Descriptions for CSP Problems
- What is in these Files?
- What is the Aralia Format?
- What do they mean?
- Pricing information
- What can we do on product descriptions (the protocols) ?
- The Full Configuration Protocol
- BT : Basic tests
- FCP: The Full Configuration Protocol
- Variants of the full configuration protocol
- GC-P : Greedy (priced) Configuration
- GC-U : Greedy (unpriced) Configuration
- GC-P: Greedy configuration at minimal price
- Gic-Approx: Approximating global inverse consistency
- FC-Alt, GC-Alt: Alternative Values
- CG: Conflict generation
- GC-expl, GC-res: Configuration with explanations / restorations
- Projection
- Experimental Results
Product Descriptions for CSP Problems
What is in these Files?
They are examples of product descriptions in Aralia format. There are two of
them:
hereafter colloquially called Big and Medium, respectively. They were reworked
from actual vehicle ranges; by changing all variable names we effectively removed
the actual semantics of the products. However it may help to know Medium was a
small urban car, so you can regard it a sandbox example. Big was a utility van
with serious product variability, and is the example to test your solvers &
algorithms with.
We have also translated the xml CSP format:
What is the Aralia Format?
Meaning: in general? I have no idea. However the files use only a tiny subset
of the format, and obey a lot of additional conventions not part of the format.
Wherever whitespace is legal, there may be a language C-style comment, bracketed
between /* and */. There are 2 parts in a file; the first contains lines like:
#(1,1,[v1.0, v1.1, v1.2, v1.3]);
which means "exactly one of Boolean variables v1.0, v1.1, v1.2, v1.3
is true", or
#(0,1,[v37.0, v37.1]);
which means "at most one of Boolean variables v37.0, v37.1 is true".
The second part of a file consists of fully parenthesized, Boolean formulas
with operators "-" (negation), "&" (conjunction), "|"
(disjunction), "=>" (entailment), and using semi-colons ";"
as line terminators. Everything else, i. e. the strings of characters you find
between operators, are (occurrences of) (Boolean) variables.
That's all!
At least concerning the format itself. Additional syntactic constraints the files adhere to are:
- every formula fits in one line,
- the # lines are grouped together in the beginning of the file, then all
the Boolean formulas,
- every Boolean variable occurs exactly once in one of the # lines, then as
many times as needed in the 2nd part of the file (the Boolean formulas),
- for a given N, there is exactly one # where all the variables vN.<something>
are found,
- the file contains first # lines, each with P variables vN.i, 0 <= i <=
P-1; then lines of the form #(0,1,[vN.i.Serie]);
then similar lines where ".Serie"
is replaced with ".Pack",
".Option", ".OptionPack".
What do they mean?
Each of the files represents a product description; a valid product is the
assignment of a value"true" or "false", to each Boolean
variable mentioned in the file, such that all the Boolean formulas are true.
In the original description, the variability of the product was represented
by a finite set of discrete variables, each taking on values in a finite range
of possible values. Back then, a valid product was an assignment of a value
to each discrete variable (in its allowed range, of course). Since many solvers
only handle binary variables, this description had to be adapted somehow; hence
the 2 parts of the files: each "#" line actually describes a discrete
variable by listing its possible values. The line
#(1,1,[v1.0, v1.1, v1.2, v1.3]);
(see above) effectively means: "there exists a discrete variable v1...
with 4 possible values.0, 1, 2, 3, exactly one of which must be
selected in a completely specified product". Similarly, the line
#(0,1,[v37.0, v37.1]);
means: "there exists a discrete variable v37... with 3 possible
values 0, 1 and NotApplicable, exactly one of which must be selected in
a completely specified product; furthermore, v37.NotApplicable never
occurs in the Boolean formulas forming the 2nd part of the file".
This is a pretty generic method for specifiying variability, however the products
in our examples are vehicles, and most car manufacturers organize the ranges
of their models according to a "version/option" paradigm: a version
is a label for a particular architecture, and options are a client's choices
within a particular version. Versions play no particular role in the configuration
task, in particular the customer is not required to commit to a version before
he chooses options; however it may be useful for some solver heuristics to know
that variable v0... is actually a version identifier, i. e. v0.0, v0.1, etc.,
label versions.
It is also an easy check for an Aralia-to-your-favorite-format converter that
all Boolean variables ending in ".Serie", ".Pack", ".Option",
".OptionPack", are in fact flags attached to the version. That is,
once you have asserted one of the v0.i variables (assigned "true"
to it, which entails that all of the others v0.i' are negated), the values of
all the .Serie, .Pack,... variables are de facto fixed. In particular,
they could be removed from the product definition without loss of information,
since each of them is logically equivalent to a disjunction of v0.i's. It may
also help to know that, if there exists a Boolean variable vN.P.Pack, then vN.P
represents a pack, that is, a combination of options sold together, and so are
the vN.j for j<>P.
Pricing information
Each of the product descriptions has an associated file giving pricing information:
You guessed it: big_price is for product Big and medium_price is for product
Medium. Both files have the same structure, which uses the basic Aralia format
(see above) with a very slight twist. A pricing file consists of lines, and
each line contains a Boolean formula in the same Boolean variables as were defined
in the corresponding product description, followed by a semi-colon ";".
Following the semi-colon is a number, whereas the Aralia grammar would allow
any string there as a comment: that's the twist.
The semantics of the whole thing is: only completely defined products have
a price, which is the total of those numbers whose associated formulas are satisfied
by the product definition.
We have also translated the pricing information in the xml CSP
format (the price of a configuration is the sum of the valuations
provided by the soft constraints)
What can we do on product descriptions (the protocols) ?
You can test solvers of course; after all, this why we built these files in
the first place. In particular, we defined 3 standard use cases: conflict generation,
configuration, projection, on which it makes sense to compare solvers and algorithms.
All 3
consist in adding temporary constraints, then drawing inferences from this augmented
set.
The Full Configuration Protocol
BT : Basic tests
As a basic test for a format converter, you may count the total number of
variables. As a basic test for the correctness of your format converter if any,
of your solver and/or your solving strategy, you may count the number of solutions
(valid, complete assignments under the constraints of the file).
- medium file: 148 variables, 278,744 solutions;
- big file: 268 variables, 24,566,537,954,855,761,920,000 solutions;
- medium.xml: 148 variables, 278,744 solutions;
- big.xml: 268 variables, 24,566,537,954,855,761,920,000 solutions;
Then, the use cases themselves are defined as follows:
FCP: The Full Configuration Protocol
The procedure simulates (a part of) a session of interractive configuration, where the user assigns a configuration
variable according to her preferences. After each choice, the minimal price,
among all possible extensions of the current configuration, is computed,
and the domains of the remaining variables are pruned
so as to remove all the values that
cannot lead to a solution (given the previous choices): in other words the domains are made "GIC-consistent".
When the domain of the next variable does not contain the value the user wanted, the
user unassigns variables (in a order that may be different from the
variable order chosen during the assignment phase, until the
desired value is allowed again. Gic and
Min price are computed upon each unassignment.
Price compuration steps (B.2 and C.2) can be skipped for an
classical unpriced configuration protocol. We call "classical
configuration - FCP-U" the unpriced version ansd "full
priced configuration - FCP-P" the priced one.
We propose two versions of this protocol: a statistical one (variables and values
are randomly chosen)
and scenarised one (it uses a set of fixed scenarii specifying what
assignments and unassignments take place, and in what order)
Statistical version :
A. Get the original domains by ensuring the global inverse consistency on the domains
B. Assigment phase:
Pick an unassigned configuration variable v
N<i> (with currend domain of size > 1) with uniform
probability
Pick one of the values (one of the vN<i>.P's) at random in its original domain, with uniform
probabilities,
If this value does not belongs to
the current domain of the variable go to C.
Otherwise:
B.1 Assert it, i. e.:
* Aralia Version: add to
the set of Boolean constraints that the chosen Boolean variable is
assigned "true".
* Xcsp Version: assign
value vN<i>.P to
variable vN<i>
B.2 Compute
the minimal and maximal prices of the complete product consistent with
the current configuration.
B.3 Ensure the global inverse
consistency on the domains:
* Aralia Version: determine
for all of the Boolean variables whether they are always true, always false, or
possible i.e. true (resp. false) in at least one valid assignment;
* XCSP Version: determine
for each variable and each value in its domain whether it belongs to at least one valid assignment;
B.4: If B.3 detects an
inconsistency, there is a bug in your program:
GIC was not correctly ensured after each choice.
C. Value
Restoration
C.1 Pick one of the previously assigned
variables at random with a uniform probability, and unassign it;
C.2 Compute the minimaland
maximal prices of the extensions consistent with the configuration
after unassignement;
C.3 Ensure the global consistency of the domains;
C.4 If the desired value
reappears in the current domain of its variable then
termiinate, else go to (C.1)
Steps B.2 and C.2 can be skipped for an
classical unpriced configuration protocol. We call "classical
configuration" the unpriced version ansd "full
priced configuration" the priced one.
Although it is theoretically possible to produce a satisfiable set
of constraints after assigning each of the variables without any
conflict, i.e. without ever entering part C the probability
is negligible in the case of both the big/medium
files, and a conflict (an unsatisfiable set) is reached after about half a dozen
assignments.
As a soundness test, the outcome of a run of phases A+B is the number
of explicit assignments before the conflict, in other words, the number of variables explicitely
assigned the value "true" before the resulting set of constraints became unsatisfiable; it is independent of the algorithm.
A second outcome of a run of phases A+B is the number of values removed
from the domain. They are independent
of the algorithm used.
The outcome of a run of phase C is the number of unassignements
realized and the number of values reintroduced in the domains. They are independent of the algorithm used.
Phase C can be replaced by the computation of all the explanations of the removal
of the targeted value (see Variants, below).
Scenarized version
To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here (xml,)
.
Init: Read an assignement sequence S, an unassignement sequence
U, and a variable-value pair (vN<i>, vN<i>.P)
A. Ensure
global inverse consistency on the domains;
B. Assigment phase: for each couple (vN<i>,
vN<i>.P) in the assignement sequence S,
B.1 Assert it, i. e.:
B.2 Compute
the minimal/maximal prices of the consistent extensions to the current configuration.
B.3 Ensure
the global inverse consistency on the domains.
C. Value restoration: for each boolean variable vN<i>
in the unassignement sequence U
C.1
Undo the assigment;
C.2
Compute the minimal/maximal prices of the
consistent extensions to the current configuration.
C.3 Ensure the global consistency of the domains;
C.4 If the desired value reappears in the current domain of
its variable then termiinate, else go to (C.1)
At the end of phase B (and not before) value vN.<k> should
have disappeard from the current domain
of vN., k being the length of the assignment sequence.
At the end of phase C (and not before) , value vN.<k'>
.O should reappear in the current
domain of vN<k'>, k' being the lenght
of the unassignment sequence
The outcome of a run of phases A+B is
the (cumulated) number of values removed from the domains by GIC.
It is independent of the algorithm used.
The outcome of a run of phase C is the (cumulated)
number of values reintroduced
in the domains. It is independent of the algorithm used.
Again, steps B.2 and C.2 can be skipped for a classical unpriced configuration
protocol. We call "classical configuration -
FCP-U" the unpriced version and "full priced configuration - FCP-P"
the priced one.
Variants of the full configuration protocol
GC-P : Greedy (priced) Configuration
Greedy configuration is a simplification of the full configuration protocol ; it corresponds to phases A and B in Full
configuration
above. ; in this variant, the user choses her values in the current
domains of the variables (we always choose assignments leading
to a satisfiable set of constraints), and, the domains are assumed to be globaly
coherent, so that the process never backtrack.
Statistical version :
A. Get the original domains by ensuring the global inverse consistency on the domains
B. Assigment phase:
Pick an unassigned configuration variable v
N<i> (with currend domain of size > 1) with uniform
probability
Pick one of the values (one of the vN<i>.P's) at random in its original domain, with uniform
probabilities,
If this value does not belongs to
the current domain of the variable go to C.
Otherwise:
B.1 Assert it, i. e.:
* Aralia Version: add to
the set of Boolean constraints that the chosen Boolean variable is
assigned "true".
* Xcsp Version: assign
value vN<i>.P to
variable vN<i>
B.2 Compute
the minimal and maximal prices of the complete product consistent with
the current configuration.
B.3 Ensure the global inverse
consistency on the domains:
* Aralia Version: determine
for all of the Boolean variables whether they are always true, always false, or
possible i.e. true (resp. false) in at least one valid assignment;
* XCSP Version: determine
for each variable and each value in its domain whether it belongs to at least one valid assignment;
B.4: If B.3 detects an
inconsistency, there is a bug in your program:
GIC was not correctly ensured after each choice.The outcome of a run is the number of explicit assignments,
that is, the number of times one of the vNs was asserted (in step 4. above),
effectively restricting the product variability. It independent of the algorithm used.
Scenarized version
To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here GP-P-scenarios-small-minimal.xml, GP-P-scenarios-medium-minimal.xml and GP-P-scenarios-big-minimal.xml-
Init: Read an assignement sequence S, an unassignement sequence
U, and a variable-value pair (vN<i>, vN<i>.P)
A. Ensure
global inverse consistency on the domains;
B. Assigment phase: for each couple (vN<i>,
vN<i>.P) in the assignement sequence S,
B.1 Assert it, i. e.:
B.2 Compute
the minimal/maximal prices of the consistent extensions to the current configuration.
B.3 Ensure
the global inverse consistency on the domains.
Note that the scenarii proposed are consistent (no inconsistency should be
detected during the run), meaning full (tonlyariables
with domain sizes greater than one after the GIC step may be
selected for assignment) and minimal (after the last
assignement, all the domains are of size one).
The outcome of a run of phases A+B is
the (cumulated) number of values removed from the domains by GIC.
It is independent of the algorithm used.
GC-U : Greedy (unpriced) Configuration
Greedy configuration is a simplification of the full configuration
protocol and of the greedy priced protocol ; in this variant, the user
choses her values in the current domains of the variables, the domains
are assumed to be globaly coherent, so that the process never
backtrack. The protocol is almost the same as the previous
greedy priced protocol , except that prices are not maintained.
This protocol can be a basis for further variants (below).
Statistical version:
- Ensure the global inverse consistency on the domains:
- Aralia Version: determine for all of the Boolean variables whether they
are always true, always false, or possible i.e. true (resp. false) in at
least one valid assignment.
- XCSP
Version: determine for each variable and each value in its domain
wheter it belongs to at least one valid assignment.
- Make the list of the discrete variables vN that have more that
one value in their domain.
- If this list is empty, terminate.
- Otherwise, randomly pick one of the discrete variables vN among those at have more that one value
in their current domains,
and pick one of the values vN.p in the current
domain of this variable, with uniform probabilities, and assert it, ie.
- Aralia Version: add to the set of Boolean constraints that
vN.p is assigned "true".
- Xcsp Version: assign this value to vN.
- Go to (1).
The outcome of a run is the number of explicit assignments,
that is, the number of times one of the vNs was asserted (in step 4. above),
effectively restricting the product variability, and the (cumulated)
number of values removed from the domains.They are independent of the algorithm used.
Scenarized version
To simulate a scenarized session of configuration, it is sufficent to run
your program only on the consistent scenarii here ( here (aralia) and here GP-P-scenarios-small-minimal.xml, GP-P-scenarios-medium-minimal.xml and GP-P-scenarios-big-minimal.xml- the same as those used for GC-P, simply ignore the min and max price informations.
- Read an sequence s in the file
- For each couple (vN<i>,vN<i>.P) in the sequence
- Assert it , i. e.
- Aralia Version: add to the set of Boolean constraints
that vN<i>.P is assigned "true".
- Xcsp Version: assign value vN<i>.P to variable vN<i>
- Ensure the global inverse consistency on the domains
- Aralia Version: determine for all of the Boolean variables whether they are always
true, always false, or possible i.e. true (resp. false) in at least one valid
assignment
- XCSP
Version: determine for each variable and each value in its
domain wheter in belong to at least one valid assignment
As in the statistical version, the outcome of a run is the (cumulated)
number of values removed from the domains. It is independent of
the algorithm used.
Note that the scenarii proposed are consistent (no inconsistency should be
detected during the run), meaning full (tonlyariables
with domain sizes greater than one after the GIC step may be
selected for assignment) and minimal (after the last
assignement, all the domains are of size one).
GC-P: Greedy configuration at minimal price
The procedure simulates an interactive configuration where, for each
possible value of the next variable, the user is shown the minimal
price over the extension to the current configuration that assign this
value to the variable ; the user then selects one of the values for
which the price is the smallest.
Statistical version:
- Ensure the global inverse consistency on the domains,
- Make the list of the discrete variables vN that have more
that one value in their domains.
- If this list is empty, terminate.
- Otherwise, pick an unassigned configuration variable v N<i>
(with current domain of size
> 1) with uniform probability
- For each value in its domain, compute the minimal price of the extensions to the current configuration
that assign this value to the variable
- Select one of the values for which
this minimal price is smallest, at random with uniform probabilities.
- Go to (1).
Scenarized version
To simulate a scenarized configuration session, run your program with the scenarii described here (aralia) and here (xml, coming soon)
- Ensure the global inverse consistency on the domains.
- Read a sequence s in the file.
- For each pair (vN<i>,vN<i>.P)
in the sequence,
- For each possible value of
variable vN<i>, compute the minimal price
of the extensions to the current configuration that assign this value to
the variable,
- Check that vN<i>.P is one of the least expensive values and assert it,
- Ensure the global inverse consistency on the domains.
Note that the scenarii
proposed are consistent (no inconsistency should be detected during
the run), meaning full (the assigments deal only with variables having a
domain sized greater than one after the GIC step and the value proposed by the
scenario is the cheapest one) and minimal (after the last assignment, all
the domains are of size one).
As a way to test your program, the outcome of this experiment is
the average price proposed for the next variable.
Gic-Approx: Approximating global inverse consistency
A
level of local
consistency (e.g. arc consistency) can be used used to approximate GIC.
The previous protocols ca nbe used to test the quality of the
approximation.
Greedy protocol. In the above
greedy protocol, when GIC is approximed by a local consistency, the number of
values removed from the domains (resp. the number of
explicit assignments) may be
smaller (resp. larger),
and may depend on the level of local consistency used.
The comparison of these numbers with those under GIC measures the quality of the level
of local consistency as an approximation of GIC. Moreover, the procedure can
lead to a dead end (whereas it always
reaches a valid configuration when GIC is ensured)
; the number of dead ends encountered
also measures the risk of using local consistencies rather than GIC.
Full configuration protocol. In this
protocol too, the (cumulated) number of values removed from the domain
may be lower than the one provided by GIC in steps A+B,
and the number of values reintroduced in the domains in step C may be higher
than the one provided by GIC. Likewise, the number of assignments
before inconsistency is detected may be larger
and may depend on the level of local consistency
used. The comparison of these three measures
with to the ones obtained by GIC estimates the quality
of the level of local consistency as an approximation of GIC.
FC-Alt, GC-Alt: Alternative Values
As already noticed, any level of local consistency can be used instead of Global
Inverse Consistency in the full config and greedy config protocols.
The propagation may also include the computation of alternative values.
Use the unpriced variant to focus on alternative values, and give the level
of local consistency used as an index
(FC-Alt_gic, FC-Alt_ac, etc)
CG: Conflict generation
We call "conflict generation" the protocol consisting in realizing only
steps A and B in the
classical (unpriced) configuration protocol above.
GC-expl, GC-res: Configuration with explanations / restorations
In order to test explanations, run steps A and B
in full configuration protocol without prices (i.e. conflict
generation), then replace that C by the computation of
:
* all the explanations
of the removal of vN<i>.P from the current domain of vN.<i>.
I.e. the minimal sets of user choices that are inconsistent with the assignment
of this value to this variable.
* all the possible
restorations for vN<i>.P in the current domain of vN.<i>.
I.e. the maximal sets of user choices that are consistent with the
assignment of thisvalue to this variable.
The outcome of this experiment is the
number of explanations (resp. restorations)
and their average size.
Projection
This can be regarded as an exhaustive list of assigments satisfying all the constraints,
restricted to a limited number n of configuration variables (Option, Pack and Series variables are excluded)
- For a specified n, pick a subset of n variables VN<i>
among the vN...'s, at random with uniform probability (or read it in the following
scenario file)
- Compute the projection of the valid configuration on thesevariables:
- Aralia Version: Determine which of the assignments vN<i>.j=true
are valid under the Boolean constraints specified in the Aralia file.
- XCSP Version: Determine which of the assignments of the vN<i>
are valid under the constraints specified in the XCSP file.
As in the statistical version, the outcome of a run is the number of
(partial) assignments in the projection computed and the number
of values tested (i.e. the average number
of possible values for thevariables
under concern). They are independent of the algorithm
used.
Experimental Results
As these are performance tests, you would likely try a fair number of runs.
Here are our results for batches of 100, 1,000 and 10,000 consecutive runs respectively,
with the average value of the characteristic outcome (see above, the descriptions
of the use cases) in addition to the total running time.
The runs were on a on an AMD Athlon II, dual-core B22 under Windows 7; keep
in mind however, our solver engine is in Java so our setup is essentially 1-core.
The JVM was a Win32 exe running in WoW64 (in less pedantic words, a 32 bit exe),
and our running times include the JIT compilation overhead. On the other hand,
they do not include a prologue to the batch of runs proper, during which the file Big.txt was compiled to a more convenient
form. This took about 5 sec, including file reading + Aralia parsing. The resulting
structure is very, very roughly 1MB large.
Test |
Outcome |
10,000 runs |
1,000 runs |
100 runs |
Full Configuration statistical
FCP-P-rand
|
# of explicit assignments before a conflict |
5.07
|
5.12
|
5.17
|
# (cumulated) of values removed from the domains
|
|
|
|
average minimal price at the backtrack point | | | |
# number of unassignements
|
|
|
|
# (cumulated) of values restored in the domains |
|
|
|
Total running time
|
1251.052s
|
123.492s
|
12.698s
|
Running time step A (avg. per scen in seconds / max) | | | |
Running time step B2+B3 (avg. per scen in seconds / max)
|
|
|
|
Running time step C2+C3 (avg. per scen in seconds / max) |
|
|
|
Full (unpriced) configuration, statistical
FCP-U-rand
|
# of explicit assignments before a conflict
|
|
|
|
# (cumulated) of values removed from the domains | | | |
# number of unassignements | | | |
# (cumulated) of values restored in the domains | | | |
Running time step A (avg. per scen in seconds / max) | | | |
Running time step B2 (avg. per scen in seconds / max) |
|
|
|
Running time steps C2 (avg. per scen in seconds / max) | | | |
Total running time
| | | |
Conflict Generation statistical
CG-rand
|
# of assigments before conflict, avg. |
5.07 |
5.12 |
5.17 |
# (cumulated) of values removed |
|
|
|
running time (avg., in seconds) |
0.019 |
0.019 |
0.019 |
Greedy (unpriced) Configuration, statistical
GC-U-rand
|
|
# (cumulated) number of values removed from the domains
|
|
|
|
# of explicit assignments, avg. |
16.10 |
15.59 |
15.20 |
Total running time |
|
|
|
Greedy configuration at minimal price
GC-P-rand
|
Average minimal price proposed for the next variable |
|
|
|
Total running time
| 1078.951s
| 107.47s
|
10.64s |
Projection, 8 vars.
(proj8)
|
# of values tested, avg. |
37.533 |
42.90 |
47.95 |
# of partial assignments, avg. |
198.61 |
239.25 |
252.08 |
running time (avg., in seconds) |
0.0164 |
0.022 |
0.024 |
Projection, 7 vars.
(proj7)
|
# of values tested, avg. |
34.39 |
35.41 |
29.39 |
# of partial assignments, avg. |
134.92 |
136.24 |
86.01 |
running time (avg., in seconds) |
0.0124 |
0.0128 |
0.0007 |
Projection, 6 vars.
(proj6)
|
# of values tested, avg. |
28.12 |
25.70 |
17.39 |
# of partial assignments, avg. |
86.15 |
76.88 |
42.67 |
running time (avg., in seconds) |
0.0084 |
0.00695 |
0.0031 |
|
|
|
|
|
(c) Renault SAS, IAA-SICG 2013.
First Release 2013-01-18.
Latest Release 2013-11-06.