Solving a planning if finding the goal state is part of the problem

Question

I having trouble finding some starting points for solving an occupancy problem which seems like a good candidate for ai.

Assume the following situation: In a company I have n cars and m employees. Not every employee can drive any car (f. e. a special driving license is required). A car can only be used by one employee at a specific point in time.

There is a plan which states which employee must be somewhere within some time (therefor they must use a car, so the car is blocked for that amount of time).

The goal is to find a near optimal occupancy of the cars according to that plan.

This problem is easy to specify, but I'm stumped as to which methods to implement.

As it can be represented by a graph I think the right way to solve such a problem is using searching techniques, but a problem here is that I don't know the goal state (and there is no efficient way to compute it - thats the task I want the ai to do...). Neither finding the goal state is in fact part of the problem.

So my question is: What ai techniques could be used to solve such a problem ?

Edit: Some clarification:

Assmume we have two sets - one of the employees (E) and one of cars (C). |C| < |E| is most likely true. Each car has an assigned priority which corresponds to the costs of using it (for example using a Ferrari costs more than using a Dacia, therefore a Dacia has a higher priority (ex. 1) compred to the Ferrari (ex. 10))). Assume further that having employees which are not using a car at a specific time slice are a bad thing - they cost an individual penalty (you want the employeed to be at the customer and sell things etc.).
The goal is to find the occupation of employees and cars which has a low total cost.

One Example: If you assign an employee to a car at a specific time slice it may turn out that another employee gets no car within that time slice. This can be either because

a car is free, but he has no license for it
because a car is free, but the costs of using this car would be higher than having the employee staing at the head quater
because no car is free anymore

Of cause it could be better in terms of costs to change the occupation and give that employee which got no car in this solution a car and therefore having another employee getting no car or not using all cars or ...

Note: There is no need to find an exact optimal solution (=lowest total cost of all possible occupations), as this would require checking out all possible occupations of the exponential solution space. Insetad finding a more or less good approximation of a near-optimal low total cost is sufficent.

What are you optimizing? Are some cars more expensive to drive than others? Otherwise I would think any allocation that worked would be as good as any other allocation. — George White, Jan 16 '20 at 19:23
Same question as Welker. Other than having enough cars to meet maximum demand (which is a simple matter of finding the time in the plan when the most employees are out, and counting them), there does not appear to be any optimisation to do yet. Is some detail missing? — Neil Slater, Jan 16 '20 at 19:55
@GeorgeWhite I added some clarification. I hope it is more clear now. — Welker, Jan 17 '20 at 08:03
I do not think adding a bounty will help here. The question is still poorly specified. I suggest trying to forumate the problem mathematically. It will either be clearer to readers what you are asking, or more likely here I think it will expose where the incomplete/woolly thinking is occurring and we can base answers around resolving that — Neil Slater, Jan 20 '20 at 09:48
The first step would be to specify the cost function you are trying to optimize. If you cannot assess the quality of a solution, you are not going to find a good solution. — BlindKungFuMaster, Jan 20 '20 at 13:04

score 1 · Answer 1 · answered Jan 24 '20 at 01:01

Your question is still nearly perfectly unclear, but let's make a few guesses in order to make some progress.

The biggest mystery are your time slices:

How many of them you need? With more granular time, you get more unknowns and more constraints and the complexity grows quickly.
When someone gets a car, they probably use it to travel somewhere, so it may be necessary to assign the driver to the car for multiple subsequent slices depending in the distance. That's something completely missing from your description (so I'll ignore it).

As already said, you need a cost function, i.e., something to minimize. Even before that, you need to define your variables. Let's say, you need binary variables

x[t, c, e]

such that x[t, c, e] = 1 when at time slice t the car c is assigned to the employee e (otherwise, it's zero). Now, we can specify some constraints

x[t, c, e] = 0 for each t, when the employee e can't drive the car c
for each t and c, the sum of x[t, c, e] over all e is at most one (as no employee can use two cars at the same time)
for each t and e, the sum of x[t, c, e] over all c is at most one (as no car can be use twice at the same time)

I may have missed some constraints, but that's no big deal; I hope you've got the idea.

Your cost function will be a sum of a few terms.

What you wrote about priorities is unclear to me, too, but you can define a weight w[c, e] (or maybe just w[c]), so let the sum of w[c, e] * x[t, c, e] over all t, c and e be the term describing these priorities.
Let u[e] be the cost of giving the employee e no car. The expression y[t, e] = 1 - sum over all c of x[t, c, e] equals to one, when the employee e has no car at time slice t. Now, the corresponding cost term can be expressed as the sum of u[e] * y[t, e] over all e and t.

Again, there may be more of this.

What I described so far is a integer linear programming formulation of your problem. Convert my text into ILP generating code, get an ILP solver, feed it and run it....

When you problem is small, you may even get the optimum. Otherwise, you'll get some solution (this is guaranteed) and a bound on the optimum, so you can know how far you might improve.

That's more than any heuristic solver (e.g., simulated annealing) can offer. OTOH arriving at a good solution may be faster using some heuristic solver. But that's all irrelevant until you get a clear problem definition.

wyrml4 · Answer 2 · 2020-01-24T17:13:21.730

I think there are different ways to solve the problem you presented: 1) It could be seen as classical resource optimization problem that can be solved via Linear Programming setup (highly suggested taking a look, not everything needs ML). You can find some resources here and quick intro to LP here

Linear Programming (LP) is a mathematical procedure for determining optimal allocation of scarce resources. LP is a procedure that has found practical application in almost all facets of business, from advertising to production planning. Transportation, distribution, and aggregate production planning problems are the most typical objects of LP analysis. In the petroleum industry, for example a data processing manager at a large oil company recently estimated that from 5 to 10 percent of the firm's computer time was devoted to the processing of LP and LP-like models.

2) Setup as ML problem using Deep Q-learning Network (might be overkill), at least it comes to my mind as easier setup (potentially). There are probably tons of ways to set your problem and environment. With DQN you would build an simulated environment for your problem where you create an list (or dictionary) of cars with the license requirement, a list for employees and their license, another one for schedule which could simply state 0 = no meeting at that time, 1 = meeting at that time (assuming not all employees need a car at all hours, if that is not true, then you can even remove the schedule requirement).

So DQN is all about action -> reward optimization. You feed in a state, that could be a series of variables, and it spins our an array of actions with their respective scoring. For state columns, you could use cars availability (like one column per vehicle), while actions could be a 2-dimensional array of weather each employee might take a specific car. You could leave in the environment simulator the logic check of weather an employee has the correct license. I.e. if the model sends an employee to the wrong licensed car, then you can return a negative value as penalty as reward. For correct assignment, reward could be reward assigned to each employee (fixed or variable) minus cost of car.

Once trained, at any given state sent to the model, it should output a 2D matrix showing the score of each employee taking each vehicle. For each vehicle, you could select the employee that would get max score, so just numpy slicing and dicing at that point. Again this is a quick idea after 10 minutes of thoughts, so take with little pinch of salt. Also there are many DQN flavors (target-Q, dueling-Q, experience replay, rainbow DQN etc.)

3) Use the DQN structure above but simply it into a Q-learning problem. The actions set could be the same, but state could simply by period of your schedule (1,2,3,4,5...). You leave most logic to the simulated environment, like availability of cars at each period state (like at each state, based on actions return, you can dynamically set which cars will be available for next one). Make sure you set reward and penalty correctly. Create a good balance of exploration vs exploiting, maybe using epsilon-greedy policy and let it run.

Solving a planning if finding the goal state is part of the problem

2 Answers2