Product Descriptions for CSP Problems

What is in these Files?

They are examples of product descriptions in Aralia format. There are two of them:

X62.txt,
X95.txt,

hereafter colloquially called X62 and X95, respectively. They were reworked from actual vehicle ranges; by changing all variable names we effectively removed the actual semantics of the products. However it may help to know X95 was a small urban car, so you can regard it a sandbox example. X62 was a utility van with serious product variability, and is the example to test your solvers & algorithms with.

We have also translated the xml CSP format: big.xml and medium.xml

What is the Aralia Format?

Meaning: in general? I have no idea. However the files use only a tiny subset of the format, and obey a lot of additional conventions not part of the format. Wherever whitespace is legal, there may be a language C-style comment, bracketed between /* and */. There are 2 parts in a file; the first contains lines like:

#(1,1,[v1.0, v1.1, v1.2, v1.3]);

which means "exactly one of Boolean variables v1.0, v1.1, v1.2, v1.3 is true", or

#(0,1,[v37.0, v37.1]);

which means "at most one of Boolean variables v37.0, v37.1 is true". The second part of a file consists of fully parenthesized, Boolean formulas with operators "-" (negation), "&" (conjunction), "|" (disjunction), "=>" (entailment), and using semi-colons ";" as line terminators. Everything else, i. e. the strings of characters you find between operators, are (occurrences of) (Boolean) variables.

That's all!

At least concerning the format itself. Additional syntaxic constraints the files adhere to are:

every formula fits in one line,
the # lines are grouped together in the beginning of the file, then all the Boolean formulas,
every Boolean variable occurs exactly once in one of the # lines, then as many times as needed in the 2nd part of the file (the Boolean formulas),
for a given N, there is exactly one # where all the variables vN.<something> are found,
the file contains first # lines, each with P variables vN.i, 0 <= i <= P-1; then lines of the form #(0,1,[vN.i.Serie]); then similar lines where ".Serie" is replaced with ".Pack", ".Option", ".OptionPack".

What do they mean?

Each of the files represents a product description; a valid product is the assignment of a value"true" or "false", to each Boolean variable mentioned in the file, such that all the Boolean formulas are true.

In the original description, the variability of the product was represented by a finite set of discrete variables, each taking on values in a finite range of possible values. Back then, a valid product was an assignment of a value to each discrete variable (in its allowed range, of course). Since many solvers only handle binary variables, this description had to be adapted somehow; hence the 2 parts of the files: each "#" line actually describes a discrete variable by listing its possible values. The line

#(1,1,[v1.0, v1.1, v1.2, v1.3]);

(see above) effectively means: "there exists a discrete variable v1... with 4 possible values v1.0, v1.1, v1.2, v1.3, exactly one of which must be selected in a completely specified product". Similarly, the line

#(0,1,[v37.0, v37.1]);

means: "there exists a discrete variable v37... with 3 possible values v37.0, v37.1 and v37.NotApplicable, exactly one of which must be selected in a completely specified product; furthermore, v37.NotApplicable never occurs in the Boolean formulas forming the 2nd part of the file".

This is a pretty generic method for specifiying variability, however the products in our examples are vehicles, and most car manufacturers organize the ranges of their models according to a "version/option" paradigm: a version is a label for a particular architecture, and options are a client's choices within a particular version. Versions play no particular role in the configuration task, in particular the customer is not required to commit to a version before he chooses options; however it may be useful for some solver heuristics to know that variable v0... is actually a version identifier, i. e. v0.0, v0.1, etc., label versions.

It is also an easy check for an Aralia-to-your-favorite-format converter that all Boolean variables ending in ".Serie", ".Pack", ".Option", ".OptionPack", are in fact flags attached to the version. That is, once you have asserted one of the v0.i variables (assigned "true" to it, which entails that all of the others v0.i' are negated), the values of all the .Serie, .Pack,... variables are de facto fixed. In particular, they could be removed from the product definition without loss of information, since each of them is logically equivalent to a disjunction of v0.i's. It may also help to know that, if there exists a Boolean variable vN.P.Pack, then vN.P represents a pack, that is, a combination of options sold together, and so are the vN.j for j<>P.

What can we do with them?

You can test solvers of course; after all, this why we built these files in the first place. In particular, we defined 3 standard use cases: conflict generation, configuration, projection, on which it makes sense to compare solvers and algorithms. All 3 are based on the set of Boolean constraints defined by the X62 file, and consist in adding temporary constraints, then drawing inferences from this augmented set.

As a basic test for a format converter, you may count the total number of discrete variables. As a basic test for the correctness of your format converter if any, of your solver and/or your solving strategy, you may count the number of solutions (valid, complete assignments under the constraints of the file).

X95 file: 148 discrete variables, 278,744 solutions;
X62 file: 268 discrete variables, 24,566,537,954,855,761,920,000 solutions;

Then, the use cases themselves are defined as follows:

Conflict Generation

Recall that each variable vN... is implicitly defined by one of the "#" lines of the Aralia file (see above), and that exactly one of its values (one of the Boolean variables vN.i) may be assigned the value "true". The procedure is iterative, each step adding one constraint to the set initially formed by the formulas in the file.

Pick a random ordering N for the discrete variables vN..., with uniform probability; then, or each of the vN...'s in turn:
Pick one of the values (one of the vN.P's) at random, with uniform probabilities, and assert it, i. e. add to the set of Boolean constraints that the chosen Boolean variable is assigned "true".
Determine whether the augmented set of constraints is satisfiable.
If it is, determine for all of the Boolean variables whether they are always true, always false, or possible i.e. true (resp. false) in at least one valid assignment, then go to (2); else terminate.

Although it is theoretically possible for this procedure to produce a satisfiable set of constraints after assigning each of the discrete variables, the probability is negligible in the case of the X62 file, and a conflict (an unsatisfiable set) is reached after about half a dozen assignments.

The outcome of a run is the number of assignments before a conflict, in other words, the number of variables explicitely assigned the value "true" before the resulting set of constraints became unsatisfiable.

Configuration

This is almost the same as Conflict Generation above, except that we always choose assignments leading to a satisfiable set of constraints. Recall that each variable vN... is implicitly defined by one of the "#" lines of the Aralia file, and that exactly one of its values (one of the Boolean variables vN.i) may be assigned the value "true".

Start with a list of all the vN.P's (the Boolean variables that stand for the values of the discrete variables in the product definition):
Check for each of them whether it is always true, always false, or possible; remove it from the list if it is always true or always false.
Pick one of the remaining vN.P's at random with uniform probabilities, and assert it; i. e. add to the set of Boolean constraints that the chosen Boolean variable is assigned "true".
If the list of vN.P's is not empty go to (2); else terminate.

The outcome of a run is twofold: the total number of values tested, that is, the number of times the vN.P's where checked, and the number of explicit assignments, that is, the number of times one of the vN.P's was assigned true (in step 3. above), effectively restricting the product variability.

Projection

This can be regarded as an exhaustive list of valid assigments in Configuration above, restricted to the first vN...'s.

For a specified n, pick a subset of size n, N 0<=i<=n-1 among the vN...'s, at random with uniform probability.
Determine which of the assignments vN.j=true are valid under the Boolean constraints specified in the file.

Statistical Results

As these are performance tests, you would likely try a fair number of runs. Here are our results for batches of 100, 1,000 and 10,000 consecutive runs respectively, with the average value of the characteristic outcome (see above, the descriptions of the use cases) in addition to the total running time.

The runs were on a on an AMD Athlon II, dual-core B22 under Windows 7; keep in mind however, our solver engine is in Java so our setup is essentially 1-core. The JVM was a Win32 exe running in WoW64 (in less pedantic words, a 32 bit exe), and our running times include the JIT compilation overhead. On the other hand, they do not include a prologue to the batch of runs proper, during which the X62 file was compiled to a more convenient form. This took about 5 sec, including file reading + Aralia parsing. The resulting structure is very, very roughly 1MB large.

Test Outcome 10,000 runs 1,000 runs 100 runs

Conflict Gen. # of assigments before conflict, avg. 5.07 5.12 5.17

running time (avg., in seconds) 0.019 0.019 0.019

Configuration # of Boolean variables tested, avg. 3499.6 3476.1 3507.5

# of explicit assignments, avg. 16.10 15.59 15.20

running time (avg., in seconds) 0.016 0.016 0.015

Projection, 8 vars. # of values tested, avg. 37.533 42.90 47.95

# of partial assignments, avg. 198.61 239.25 252.08

running time (avg., in seconds) 0.0164 0.022 0.024

Projection, 7 vars. # of values tested, avg. 34.39 35.41 29.39

# of partial assignments, avg. 134.92 136.24 86.01

running time (avg., in seconds) 0.0124 0.0128 0.0007

Projection, 6 vars. # of values tested, avg. 28.12 25.70 17.39

# of partial assignments, avg. 86.15 76.88 42.67

running time (avg., in seconds) 0.0084 0.00695 0.0031

First Release 2013-01-18.

Latest Release 2013-01-30.

Test	Outcome	10,000 runs	1,000 runs	100 runs
Conflict Gen.	# of assigments before conflict, avg.	5.07	5.12	5.17
Conflict Gen.	running time (avg., in seconds)	0.019	0.019	0.019
Configuration	# of Boolean variables tested, avg.	3499.6	3476.1	3507.5
	# of explicit assignments, avg.	16.10	15.59	15.20
	running time (avg., in seconds)	0.016	0.016	0.015
Projection, 8 vars.	# of values tested, avg.	37.533	42.90	47.95
	# of partial assignments, avg.	198.61	239.25	252.08
	running time (avg., in seconds)	0.0164	0.022	0.024
Projection, 7 vars.	# of values tested, avg.	34.39	35.41	29.39
	# of partial assignments, avg.	134.92	136.24	86.01
	running time (avg., in seconds)	0.0124	0.0128	0.0007
Projection, 6 vars.	# of values tested, avg.	28.12	25.70	17.39
	# of partial assignments, avg.	86.15	76.88	42.67
	running time (avg., in seconds)	0.0084	0.00695	0.0031