Jekyll2020-04-30T22:48:08+00:00https://emrahcimren.github.io/feed.xmlOR & Data Science StoriesOperations Research/Data Science ExplorerEmrah Cimrencimren.1@gmail.comSolving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python2019-12-15T00:00:00+00:002019-12-15T00:00:00+00:00https://emrahcimren.github.io/operations%20research/Solving%20Single%20Depot%20Capacitated%20Vehicle%20Routing%20Problem%20Using%20Column%20Generation%20with%20Python<p>Vehicle routing problem (VRP) is identifying the optimal set of routes for a set of
vehicles to travel in order to deliver to a given
set of customers. When vehicles have limited carrying capacity and
customers have time windows within which the deliveries must be made,
problem becomes capacitated vehicle routing problem with time windows (CVRPTW).
In this post, we will discuss how to tackle CVRPTW to get a fast and
robust solution using column generation.</p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/truck_main.jpg" alt="_config.yml" /></p>
<h2 id="problem">Problem</h2>
<p>We consider a pizza restaurant chain, <strong>PPizza</strong>, in the Los Angeles, CA area with 34 stores.
Each store operates from 10am to 1am everyday. <strong>PPizza</strong> offers three pizza sizes
(small, medium, large) with various toppings and soft drinks. Pizzas are prepared with fresh
ingredients and baked in store on demand.</p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/pizza.PNG" alt="_config.yml" /></p>
<p><strong>PPizza</strong> forecasts weekly demand of food items for each store and identifies required ingredients
and soft drinks. Fresh ingredients are delivered to stores daily from the main depot once a day.
Soft drinks are delivered and replenished by suppliers directly.</p>
<p>Figure 1 shows location of stores and the depot. Each store has time windows between
9am and 3pm where delivery needs to be
done within. Unloading time varies by store depending on location and parking availability.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/store_depot_map.PNG" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 1: PPizza depot and stores</em></td>
</tr>
</tbody>
</table>
<p>Trucks can leave from the depot at 6am and need to return the depot by 5pm.
Each truck can be used once and has a limited capacity of <script type="math/tex">60</script> lbs.</p>
<p>Since delivery cost is a function of number of trucks used in delivery,
minimizing the total number of trucks used for delivery minimizes total cost.
We want to identify the truck operating schedules
to be able to deliver
fresh ingredients to each store with given time windows by minimizing the total cost
(minimizing the number of trucks used).</p>
<h2 id="analysis">Analysis</h2>
<p>We first formulate the problem as a mixed integer program. Then, we solve the problem
for a range of number of available trucks using the formulation.
Since CVRPTW is NP-hard, we expect that model run time increases as number of available trucks
decreases.</p>
<p>We also develop column generation based algorithm to solve the problem.</p>
<p>Finally, we compare performance of two solution methodologies; mixed integer program and
column generation.</p>
<h3 id="general-formulation">General Formulation</h3>
<p>We develop a mixed integer model for the <strong>PPizza</strong> delivery problem as follows.</p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/general_model_inputs.PNG" alt="_config.yml" /></p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/general_model_formulation.PNG" alt="_config.yml" /></p>
<p>We solve the mixed integer model using Python with PuLp. The following is the
implementation.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">timeit</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">threading</span> <span class="kn">import</span> <span class="n">Thread</span><span class="p">,</span> <span class="n">currentThread</span>
<span class="kn">import</span> <span class="nn">queue</span>
<span class="kn">from</span> <span class="nn">cvrptw_optimization</span> <span class="kn">import</span> <span class="n">single_depot_general_model_pulp</span> <span class="k">as</span> <span class="n">sm</span>
<span class="c1"># Read input data
</span><span class="n">customers</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/customers.pkl'</span><span class="p">)</span>
<span class="n">depots</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/depots.pkl'</span><span class="p">)</span>
<span class="n">transportation_matrix</span><span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/transportation_matrix.pkl'</span><span class="p">)</span>
<span class="n">vehicles</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/vehicles.pkl'</span><span class="p">)</span>
<span class="c1"># Model parameters
</span><span class="n">bigm_input</span><span class="o">=</span><span class="n">transportation_matrix</span><span class="o">.</span><span class="n">DRIVE_MINUTES</span><span class="o">.</span><span class="nb">max</span><span class="p">()</span><span class="o">*</span><span class="mi">20</span>
<span class="n">solver_time_limit_minutes_input</span> <span class="o">=</span> <span class="mi">480</span>
<span class="c1"># Calculate range for vehicles
</span><span class="n">min_vehicles</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">round</span><span class="p">(</span><span class="n">customers</span><span class="o">.</span><span class="n">DEMAND</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span><span class="o">/</span><span class="mi">60</span><span class="p">)</span><span class="o">+</span><span class="mi">2</span><span class="p">)</span>
<span class="n">max_vehicles</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">vehicles</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span>
<span class="c1"># Define functions
</span><span class="k">def</span> <span class="nf">run_single_depot_general_model</span><span class="p">(</span><span class="n">vehicle</span><span class="p">,</span>
<span class="n">depots</span><span class="p">,</span>
<span class="n">customers</span><span class="p">,</span>
<span class="n">transportation_matrix</span><span class="p">,</span>
<span class="n">vehicles</span><span class="p">,</span>
<span class="n">bigm_input</span><span class="p">,</span>
<span class="n">solver_time_limit_minutes_input</span><span class="p">):</span>
<span class="s">"""
Run general model
"""</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">default_timer</span><span class="p">()</span>
<span class="n">vehicles_sub</span> <span class="o">=</span> <span class="n">vehicles</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">vehicle</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">vehicles_sub</span><span class="p">))</span>
<span class="n">solution_objective</span><span class="p">,</span> <span class="n">solution_paths</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">run_single_depot_general_model</span><span class="p">(</span><span class="n">depots</span><span class="p">,</span>
<span class="n">customers</span><span class="p">,</span>
<span class="n">transportation_matrix</span><span class="p">,</span>
<span class="n">vehicles_sub</span><span class="p">,</span>
<span class="n">bigm</span><span class="o">=</span><span class="n">bigm_input</span><span class="p">,</span>
<span class="n">solver_time_limit_minutes</span><span class="o">=</span><span class="n">solver_time_limit_minutes_input</span><span class="p">)</span>
<span class="n">solution_paths</span><span class="p">[</span><span class="s">'OBJECTIVE'</span><span class="p">]</span> <span class="o">=</span> <span class="n">solution_objective</span>
<span class="n">solution_paths</span><span class="p">[</span><span class="s">'NUMBER_OF_VEHICLES'</span><span class="p">]</span> <span class="o">=</span> <span class="n">vehicle</span>
<span class="n">stop</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">default_timer</span><span class="p">()</span>
<span class="n">solution_paths</span><span class="p">[</span><span class="s">'MODEL_RUN_TIME_MINUTES'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">stop</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span><span class="o">*</span><span class="mi">60</span>
<span class="n">solution_paths</span><span class="o">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">r'general model solutions/{}_.csv'</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)),</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">return</span> <span class="s">'ok'</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">queue</span><span class="o">.</span><span class="n">Queue</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">worker</span><span class="p">():</span>
<span class="s">"""
Worker function to process vehicles from a queue (q).
"""</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Start thread worker."</span><span class="p">)</span>
<span class="n">vehicle</span> <span class="o">=</span> <span class="n">q</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Starting vehicle: {}"</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)))</span>
<span class="n">run_single_depot_general_model</span><span class="p">(</span><span class="n">vehicle</span><span class="p">,</span>
<span class="n">depots</span><span class="p">,</span>
<span class="n">customers</span><span class="p">,</span>
<span class="n">transportation_matrix</span><span class="p">,</span>
<span class="n">vehicles</span><span class="p">,</span>
<span class="n">bigm_input</span><span class="p">,</span>
<span class="n">solver_time_limit_minutes_input</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Finishing vehicle: {}"</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)))</span>
<span class="n">q</span><span class="o">.</span><span class="n">task_done</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"End thread worker"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">create_and_process_queue</span><span class="p">(</span><span class="n">vehicle_range_list</span><span class="p">,</span> <span class="n">max_num_threads</span><span class="p">):</span>
<span class="s">"""
Creates a queue of vehicles to process. Creates threads to process the queue.
The number of threads are limited by max_num_threads.
"""</span>
<span class="c1"># add the vehicles to the queue
</span> <span class="k">for</span> <span class="n">vehicle</span> <span class="ow">in</span> <span class="n">vehicle_range_list</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Adding vehicle {} to queue"</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)))</span>
<span class="n">q</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Create threads"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_num_threads</span><span class="p">):</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">worker</span><span class="p">)</span>
<span class="n">t</span><span class="o">.</span><span class="n">daemon</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">q</span><span class="o">.</span><span class="n">join</span><span class="p">()</span> <span class="c1"># blocks until all queue items have been processed
</span>
<span class="n">min_vehicles</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">round</span><span class="p">(</span><span class="n">customers</span><span class="o">.</span><span class="n">DEMAND</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span><span class="o">/</span><span class="mi">60</span><span class="p">)</span><span class="o">+</span><span class="mi">2</span><span class="p">)</span>
<span class="n">max_vehicles</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">vehicles</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span>
<span class="n">vehicle_range_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">vehicle</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">min_vehicles</span><span class="p">,</span> <span class="n">max_vehicles</span><span class="p">):</span>
<span class="n">vehicle_range_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">vehicle</span><span class="p">)</span>
<span class="n">vehicle_range_list</span><span class="o">.</span><span class="n">reverse</span><span class="p">()</span>
<span class="n">create_and_process_queue</span><span class="p">(</span><span class="n">vehicle_range_list</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span></code></pre></figure>
<p>You can install cvrptw_optimization package to your conda environment using the following code.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install cimren-cvrptw-optimization
</code></pre></div></div>
<p>We ran the model for the total number of vehicles, <script type="math/tex">|K|</script>, from <strong>30</strong> to <strong>11</strong>. We set the maximum model run time
to 480 minutes (8 hours). There is no solution for <script type="math/tex">K=11</script> and <script type="math/tex">K=12</script> since
maximum model run time is reached.</p>
<p>Figure 2 illustrates routing,
model objective, and run time minutes for each number of available vehicles set.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/General Solution Objectives.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 2: General model solution</em></td>
</tr>
</tbody>
</table>
<p>As we use less number of vehicles, total delivery hours is reduced by about an hour per vehicle
removed.</p>
<p>Best solution is obtained when <script type="math/tex">K=13</script> (Figure 3). Model run time is approximately 6 hours.
Model objective is <script type="math/tex">16.8</script> which is total drive hours.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/Best General Solution.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 3: Best general model solution</em></td>
</tr>
</tbody>
</table>
<p>We now implement column generation methodology.</p>
<h3 id="column-generation">Column Generation</h3>
<p>We develop a column generation approach based on <a href="https://pubsonline.informs.org/doi/abs/10.1287/opre.8.1.101">Dantzig-Wolfe decomposition</a>.
CVRPTW is decomposed into two problems, the master problem, and the subproblem
to provide better bound when linear relaxation of the problem is solved.</p>
<p>The master problem considers only a subset of variables
from the original while the subproblem identifies the new variables.
The objective function of the subproblem considers the
reduced cost of the new variables with respect to the current dual variables.
The outline of branch-and-price algorithm is
illustrated in Figure 4.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/column_generation_flow_chart.PNG" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 4: Column generation algorithm</em></td>
</tr>
</tbody>
</table>
<p>In the column generation algorithm, the master problem is solved using an initial solution.
It can be any feasible solution that meets all constraints.
In this case, we start with the depot-store-depot routes.
From this step, the dual prices of each
constraint in the master problem are obtained.
Then, the reduced cost is calculated and utilized in the objective function of
the subproblem. After solving the subproblem,
the variables (called columns in the master problem) with negative reduced
cost must be identified.
These variables are then added to the master problem and resolved iteratively.
The process is
repeated until the subproblem solution has only non-negative reduced costs columns.
Theoretically, at that instance, the
solution of the master problem is the optimal solution.</p>
<h4 id="master-problem">Master Problem</h4>
<p>We consider all feasible single vehicle routes, <script type="math/tex">L</script>, with respect to vehicle capacity
that start and end at the same depot. Master problem selects sets of routes which
minimizes total transportation cost.</p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/master_model_inputs.PNG" alt="_config.yml" /></p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/master_model_formulation.PNG" alt="_config.yml" /></p>
<h4 id="subproblem">Subproblem</h4>
<p>The subproblem attempts to generate feasible routes with negative reduced costs
to be added in the master problem. As the capacity of the vehicles, <script type="math/tex">q_k=q</script>
<script type="math/tex">\forall k\in K</script>, is be the same for all vehicles, we solve the problem
for <script type="math/tex">K=\{1\}</script>.
The explicit formulation of the subproblem is given as follows.</p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/sub_model_inputs.PNG" alt="_config.yml" /></p>
<p><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/sub_model_formulation.PNG" alt="_config.yml" /></p>
<h3 id="column-generation-algorithm-implementation">Column Generation Algorithm Implementation</h3>
<p>We run the column generation in Python as follows.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">cvrptw_optimization</span> <span class="kn">import</span> <span class="n">single_depot_column_generation_pulp</span> <span class="k">as</span> <span class="n">cg</span>
<span class="c1"># read input data
</span><span class="n">customers</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/customers.pkl'</span><span class="p">)</span>
<span class="n">depots</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/depots.pkl'</span><span class="p">)</span>
<span class="n">transportation_matrix</span><span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/transportation_matrix.pkl'</span><span class="p">)</span>
<span class="n">vehicles</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_pickle</span><span class="p">(</span><span class="s">r'data/vehicles.pkl'</span><span class="p">)</span>
<span class="n">vehicle_capacity</span> <span class="o">=</span> <span class="mi">60</span>
<span class="c1"># run column generation
</span><span class="n">solution</span><span class="p">,</span> <span class="n">iteration_statistics</span> <span class="o">=</span> <span class="n">cg</span><span class="o">.</span><span class="n">run_single_depot_column_generation</span><span class="p">(</span><span class="n">depots</span><span class="p">,</span>
<span class="n">customers</span><span class="p">,</span>
<span class="n">transportation_matrix</span><span class="p">,</span>
<span class="n">vehicles</span><span class="p">,</span>
<span class="n">vehicle_capacity</span><span class="p">,</span>
<span class="n">mip_gap</span><span class="o">=</span><span class="mf">0.001</span><span class="p">,</span>
<span class="n">solver_time_limit_minutes</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">enable_solution_messaging</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">solver_type</span><span class="o">=</span><span class="s">'PULP_CBC_CMD'</span><span class="p">,</span>
<span class="n">max_iteration</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span> </code></pre></figure>
<p>In the solution, we deliver with <script type="math/tex">12</script> trucks driving total <script type="math/tex">16.4</script> hours (Figure 5). Algorithm run time is less than 2 minutes.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/Column Generation Solution.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 5: Column generation solution</em></td>
</tr>
</tbody>
</table>
<p>Figure 6 shows algorithm convergence. Note that subproblem objective reached <script type="math/tex">> -1</script> in <script type="math/tex">75</script>
iterations.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/Column Generation Convergence.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 6: Column generation algorithm convergence</em></td>
</tr>
</tbody>
</table>
<h2 id="ppizza-solution">PPizza Solution</h2>
<p>As a result, column generation uses less number of trucks than the general mixed
integer formulation.</p>
<p>Figure 7 illustrates solution for <strong>PPizza</strong> with 12 trucks. Each truck leaves depot at 6am and
returns by 5pm.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images//2019-12-15-Solving Single Depot Capacitated Vehicle Routing Problem Using Column Generation with Python/PPizza Solution.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 7: PPizza solution</em></td>
</tr>
</tbody>
</table>
<h3 id="references">References</h3>
<p>Desrochers, M., Lenstra, J.K., Savelsbergh, M.W.P., Soumis, F. (1988).
Vehicle routing with time windows: Optimization and approximation.
In: Golden, B.L., Assad, A.A. (Eds.),
Vehicle Routing: Methods and Studies. North-Holland, Amsterdam, pp. 65–84.</p>Emrah Cimrencimren.1@gmail.comVehicle routing problem (VRP) is identifying the optimal set of routes for a set of vehicles to travel in order to deliver to a given set of customers. When vehicles have limited carrying capacity and customers have time windows within which the deliveries must be made, problem becomes capacitated vehicle routing problem with time windows (CVRPTW). In this post, we will discuss how to tackle CVRPTW to get a fast and robust solution using column generation.2019 INFORMS Annual Conference2019-10-20T00:00:00+00:002019-10-20T00:00:00+00:00https://emrahcimren.github.io/conferences/INFORMS%20Annual%20Conference%202019<p><a href="http://meetings2.informs.org/wordpress/seattle2019/">The 2019 INFORMS Annual Meeting</a> was held at
Seattle from October 20 to October 23. There were
over 7,000 attendees which was record-breaking.</p>
<style type="text/css">
p {
.width-half_logo {width: 10%}
}
</style>
<p><img src="/images/INFORMS 2019/2019_Seattle_Logo_White_Outlined.png" alt="" class="align-right width-half_logo" /></p>
<p>I organized <strong>OMS/Practice Curated: Contemporary Scheduling</strong> session
at the conference. Session was about the Operations Research applications and attracted
about forty people.</p>
<p><img src="/images/INFORMS 2019/informs_schedule.png" alt="_config.yml" /></p>
<p>I presented my work, <strong>Network Design with Routing Consideration</strong>, where we developed
a network design algorithm to identify location of distribution facilities
by considering store deliveries.</p>
<style type="text/css">
p {
.width-half {width: 30%}
}
</style>
<table>
<tbody>
<tr>
<td><img src="/images/INFORMS 2019/emrah_slide.jpg" alt="" class="align-right width-half" /></td>
<td><img src="/images/INFORMS 2019/emrah_present.jpg" alt="" class="align-right width-half" /></td>
</tr>
</tbody>
</table>
<p>The following is the slide deck.</p>
<style type="text/css">
p {
.responsive-wrap iframe{ max-width: 70%;};
}
</style>
<div class="responsive-wrap">
<!-- this is the embed code provided by Google -->
<iframe src="https://docs.google.com/presentation/d/1uKyUdQ2WzBUil71hkKUmFIzGBh_dYVwiN4GSFWfhLg4/embed?start=false&loop=false&delayms=3000" frameborder="0" width="630" height="473" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
<!-- Google embed ends -->
</div>Emrah Cimrencimren.1@gmail.comThe 2019 INFORMS Annual Meeting was held at Seattle from October 20 to October 23. There were over 7,000 attendees which was record-breaking.Predicting Short Term Trucking Rates with Random Forests2019-08-30T00:00:00+00:002019-08-30T00:00:00+00:00https://emrahcimren.github.io/transportation%20rates/Predicting%20Short%20Term%20Trucking%20Rates%20with%20Random%20Forests<p>In this post, we present a random forest model to predict short term trucking rates using Python.</p>
<p><img src="/images/dry_van.jpg" alt="image-center" class="align-center" /></p>
<p>Transportation rates are driven by different modes of transportation (air, road, rail, and ocean).
In this work, we focus on trucking related transportation modes, Full Truckload (FTL) and Less than Truckload (LTL).
We describe FTL and LTL modes, trailer types, and transportation services as follows.</p>
<p><strong>FTL:</strong> An entire truck is used for transportation.
In the FTL market, truck delivers to destination directly from shipper’s location using a dedicated truck.
The rate is the same for the use of the full truck whether the truck is 100% full or
25% full and may be different depending on where the shipment starts and ends.
Capacity of a truck can be measured in total weight, total cube, or total number of pallets.
Various truck types are used such as regular dry van, refrigerated, flatbed, tanker, and 48- and 53-foot trailers.
An FTL carrier may hold 45,000 pounds of product.</p>
<p><strong>LTL:</strong> Companies use LTL when they have a small load to ship to a destination.
In this case, hiring an entire truck to make the delivery is not economical.
Trucking company picks up the load, combines it with other companies’ pickup or deliveries,
and makes the trip to complete a route of deliveries to customer locations. An LTL carrier may
hold up to 15,000 pounds of product.</p>
<p><strong>Trailer Types:</strong> Different trailer types carry different products.
The main types include dry van, flatbed, refrigerated/temperature-controlled trailer, and tank.
Dry van is the most popular one. The flatbed trailer does not have a side wall or ceiling and used to carry
construction materials or large machinery. Food and medicines are normally hauled by temperature-controlled.
Tanks are used to haul refined oil products or chemicals in the liquid form.</p>
<p><strong>Transportation Services:</strong> Two types of services exists to cover freight transportation requirements, through long-term contracts or
on the spot market. Contract carriers is used most frequently. The contract is typically a one-year commitment,
which consists of origin/destination, service requirement, volume and any other factors that affect the price.
Spot market is used to obtain a rate when there is no availability in the contract market (lane does not exists or
rate is not accepted).</p>
<h2 id="problem">Problem</h2>
<p>Consider a network of customers and distribution centers (see Figure 1).
Products are delivered from distribution centers to customers using trucks.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_customer_warehouse_network.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 1: Distribution centers and customers</em></td>
</tr>
</tbody>
</table>
<p>Our objective is to determine FTL and LTL rates for each distribution center to each customer. We develop
a model to predict transportation rates.</p>
<h2 id="analysis">Analysis</h2>
<p>The following are the steps in the analysis, summarized by Figure 2.</p>
<ol>
<li>Clean data, remove outliers</li>
<li>Create features (feature engineering)</li>
<li>Create train and test data</li>
<li>Develop model baseline</li>
<li>Fit model and measure performance</li>
<li>Interpret model and report results</li>
<li>Persist model</li>
</ol>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_model_analysis.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 2: Analysis steps</em></td>
</tr>
</tbody>
</table>
<p>We now explain each analysis step as follows.</p>
<h2 id="1-clean-data">1. Clean Data</h2>
<p>The dataset consists of the European long-haul Truckload data, including
mode, distance covered per shipment, average shipment weight per truck, average shipment cost per truck,
and trailer type (see Figure 3).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 3: Transportation data set</em></td>
</tr>
</tbody>
</table>
<p>Figure 4 shows the transportation data profiling by distance miles, average shipment per truck,
transportation cost per truck, mode, and trailer type. We provide statistics by number of data points,
minimum value, 25% quantile, mean, median, 75% quantile, and maximum value.
The following Python code
can be used to generate those statistics.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_profiling.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 4: Transportation data set profile</em></td>
</tr>
</tbody>
</table>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">quantile_25pct</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">'''
Get 25</span><span class="si">% </span><span class="s">quantile
'''</span>
<span class="k">return</span> <span class="n">x</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.25</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">quantile_75pct</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">'''
Get 75</span><span class="si">% </span><span class="s">quantile
'''</span>
<span class="k">return</span> <span class="n">x</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.75</span><span class="p">)</span>
<span class="n">trans_costs_melted</span><span class="o">.</span><span class="n">groupby</span><span class="p">([</span><span class="s">'VARIABLE'</span><span class="p">,</span> <span class="s">'MODE'</span><span class="p">,</span> <span class="s">'TRAILER_TYPE'</span><span class="p">],</span> <span class="n">as_index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span><span class="o">.</span><span class="n">agg</span><span class="p">({</span><span class="s">'VALUE'</span><span class="p">:</span> <span class="p">[</span><span class="s">'count'</span><span class="p">,</span> <span class="s">'min'</span><span class="p">,</span> <span class="n">quantile_25pct</span><span class="p">,</span> <span class="s">'mean'</span><span class="p">,</span> <span class="s">'median'</span><span class="p">,</span> <span class="n">quantile_75pct</span><span class="p">,</span> <span class="s">'max'</span><span class="p">]})</span></code></pre></figure>
<p>We fist analyze relationships between distance miles, shipment weight per truck,
transportation cost per truck, and transportation cost per truck per mile
for each mode and trailer. Then, we identify and remove outliers
from the data set. <a href="https://en.wikipedia.org/wiki/Interquartile_range">The interquartile range (IQR)</a>
rule is applied to be able to detect outliers.</p>
<p>The following python functions
are used to generate plots and to detect outliers.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="k">def</span> <span class="nf">plot_scatter</span><span class="p">(</span><span class="n">figure_data</span><span class="p">,</span> <span class="n">x_axis_column</span><span class="p">,</span> <span class="n">y_axis_column</span><span class="p">,</span> <span class="n">legend_column</span><span class="p">,</span> <span class="n">fig_width</span><span class="p">,</span> <span class="n">fig_height</span><span class="p">,</span> <span class="n">font_scale</span><span class="p">,</span> <span class="n">grid_column</span><span class="p">,</span> <span class="n">grid_row</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="s">'''
Create a scatter plot
'''</span>
<span class="n">sns</span><span class="o">.</span><span class="nb">set</span><span class="p">(</span><span class="n">font_scale</span><span class="o">=</span><span class="n">font_scale</span><span class="p">)</span>
<span class="n">sns</span><span class="o">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"white"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">grid_row</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">FacetGrid</span><span class="p">(</span><span class="n">figure_data</span><span class="p">,</span> <span class="n">col</span><span class="o">=</span><span class="n">grid_column</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="n">legend_column</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">FacetGrid</span><span class="p">(</span><span class="n">figure_data</span><span class="p">,</span> <span class="n">col</span><span class="o">=</span><span class="n">grid_column</span><span class="p">,</span> <span class="n">row</span><span class="o">=</span><span class="n">grid_row</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="n">legend_column</span><span class="p">)</span>
<span class="n">g</span> <span class="o">=</span> <span class="p">(</span><span class="n">g</span><span class="o">.</span><span class="nb">map</span><span class="p">(</span><span class="n">sns</span><span class="o">.</span><span class="n">scatterplot</span><span class="p">,</span> <span class="n">x_axis_column</span><span class="p">,</span> <span class="n">y_axis_column</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s">"w"</span><span class="p">)</span><span class="o">.</span><span class="n">add_legend</span><span class="p">())</span>
<span class="n">g</span><span class="o">.</span><span class="n">fig</span><span class="o">.</span><span class="n">set_size_inches</span><span class="p">(</span><span class="n">fig_width</span><span class="p">,</span> <span class="n">fig_height</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">def</span> <span class="nf">detect_outlier</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="s">'''
Detect outliers using the IQR rule
'''</span>
<span class="n">Q1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.25</span><span class="p">)</span>
<span class="n">Q3</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.75</span><span class="p">)</span>
<span class="n">IQR</span> <span class="o">=</span> <span class="n">Q3</span> <span class="o">-</span> <span class="n">Q1</span>
<span class="n">outliers</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="bp">False</span><span class="p">)</span>
<span class="n">outliers</span><span class="p">[</span><span class="n">data</span> <span class="o"><</span> <span class="p">(</span><span class="n">Q1</span> <span class="o">-</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">IQR</span><span class="p">)]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">outliers</span><span class="p">[</span><span class="n">data</span> <span class="o">></span> <span class="p">(</span><span class="n">Q3</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">IQR</span><span class="p">)]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">return</span> <span class="n">outliers</span></code></pre></figure>
<p>Figure 5 shows relationships for FTL. We observe that there is a linear relationship
between distance traveled
and transportation cost per truck which means that he FTL rate is the same whether the truck
is 100% full or 25% full
and changes depending on where the shipment starts and ends.
We also see that FTL shipments in temperature controlled
trailers are $0.7 more expensive per mile than the shipments in dry van (Figure 4).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_ftl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ftl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_weight_cost_ftl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_weight_distance_ftl.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 5: FTL profiles</em></td>
</tr>
</tbody>
</table>
<p>There are FTL shipments where shipment weight per truck is less than 10,000 LBS. We also identified points with a large
transportation cost per truck per mile using the IQR rule.
We consider those points as outliers and remove from the data set (see Figure 6).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ftl_outliers.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ftl_outliers_cleaned.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 6: FTL outliers</em></td>
</tr>
</tbody>
</table>
<p>Figure 7 shows relationships between distance miles, shipment weight per truck,
transportation cost per truck, and transportation cost per truck per mile for LTL.
There is no clear distinct relationship between any of those variables.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_ltl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ltl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_weight_cost_ltl.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_weight_distance_ltl.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 7: LTL profiles</em></td>
</tr>
</tbody>
</table>
<p>We treat shipments
more than 5,000 mile distance as outliers since firms prefer FTL shipments for long distance
since it is more economical comparing to LTL. After removing outlier points, we see a clear
relationship between distance miles and transportation
cost per truck for LTL (Figure 8).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_no_outliers_ltl.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 8: LTL outliers</em></td>
</tr>
</tbody>
</table>
<p>Similar in FTL, there exists LTL shipments with a large transportation cost per mile.
We use the IQR rule to detect
those outliers (see Figure 9).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ltl_outliers.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_input_data_flt_scatter_distance_cost_per_mile_ltl_outliers_cleaned.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 9: LTL outliers</em></td>
</tr>
</tbody>
</table>
<h2 id="2-create-features-feature-engineering">2. Create Features (Feature Engineering)</h2>
<p>Feature engineering is the process of using domain knowledge of the data to
create features for machine learning algorithms.</p>
<p>We use one-hot encoding to convert categorical data into a numerical format without losing any information.
Figure 10 shows how FTL data is transformed as an example.
We also provide Python code below for one-hot encoding.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_ftl_model_data.jpg" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_ftl_one_hot_model_data.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 10: FTL one-hot encoding</em></td>
</tr>
</tbody>
</table>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="n">trans_cost_ftl_with_one_hot</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">get_dummies</span><span class="p">(</span><span class="n">trans_cost_ftl</span><span class="o">.</span><span class="n">drop</span><span class="p">([</span><span class="s">'MODE'</span><span class="p">,</span> <span class="s">'OUTLIER'</span><span class="p">,</span> <span class="s">'TRANS_COST_PER_TRUCK_USD_PER_MILE'</span><span class="p">],</span> <span class="mi">1</span><span class="p">))</span></code></pre></figure>
<h2 id="3-create-train-and-test-data">3. Create Train and Test Data</h2>
<p>At this step, we split data into training and testing sets to evaluate performance of the model.
We randomly select 75% of the data for training and 25% of data for testing using the following Python function.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="k">def</span> <span class="nf">create_train_test_splits</span><span class="p">(</span><span class="n">labels_data</span><span class="p">,</span> <span class="n">features_data</span><span class="p">,</span> <span class="n">test_size</span><span class="p">):</span>
<span class="s">'''
Create train and test data
'''</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">labels_data</span><span class="p">)</span>
<span class="n">features_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">features_data</span><span class="o">.</span><span class="n">columns</span><span class="p">)</span>
<span class="n">features</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">features_data</span><span class="p">)</span>
<span class="n">train_features</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_labels</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">test_size</span> <span class="o">=</span> <span class="n">test_size</span><span class="p">,</span> <span class="n">random_state</span> <span class="o">=</span> <span class="mi">42</span><span class="p">)</span>
<span class="k">return</span> <span class="n">train_features</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_labels</span></code></pre></figure>
<h2 id="4-develop-model-baseline">4. Develop Model Baseline</h2>
<p>Before making predictions, we need to develop a model baseline to level set performance of the model.
If the model can not improve the baseline, we need to try a new model.</p>
<p>In our problem, we set the baseline prediction as the average transportation cost
per mile by mode and trailer type.
We calculate Mean Absolute Error (MAE), Mean Percentage Error (MAPE),
and accuracy as performance metrics.</p>
<p>We now define MAE, MAPE, and accuracy. Let <script type="math/tex">y_i</script> be the prediction and <script type="math/tex">x_i</script> be the actual value for <script type="math/tex">i=1, \dots, n</script>.</p>
<script type="math/tex; mode=display">MAE = \frac{1}{n}\sum_{i=1}^n|y_i-x_i|</script>
<script type="math/tex; mode=display">MAPE = 100\frac{1}{n}\sum_{i=1}^n\frac{|y_i-x_i|}{x_i}</script>
<script type="math/tex; mode=display">Accuracy = 100 - MAPE</script>
<p>The following Python code is used to calculate baseline and
accuracy metrics.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="k">def</span> <span class="nf">apply_baseline_and_calculate_performance</span><span class="p">(</span><span class="n">trans_cost_with_one_hot</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">features_list</span><span class="p">,</span> <span class="n">test_labels</span><span class="p">):</span>
<span class="s">'''
Calculate baseline
'''</span>
<span class="n">baseline</span> <span class="o">=</span> <span class="n">trans_cost_with_one_hot</span><span class="o">.</span><span class="n">groupby</span><span class="p">([</span><span class="s">'MODE'</span><span class="p">,</span> <span class="s">'TRAILER_TYPE_DRY VAN'</span><span class="p">,</span> <span class="s">'TRAILER_TYPE_TEMPERATURE CONTROLLED'</span><span class="p">],</span> <span class="n">as_index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span><span class="o">.</span><span class="n">agg</span><span class="p">({</span><span class="s">'TRANS_COST_PER_TRUCK_USD'</span><span class="p">:</span> <span class="s">'sum'</span><span class="p">,</span> <span class="s">'DISTANCE_MILES'</span><span class="p">:</span> <span class="s">'sum'</span><span class="p">})</span>
<span class="n">baseline</span><span class="p">[</span><span class="s">'TRANS_COST_PER_TRUCK_USD_PER_MILE'</span><span class="p">]</span> <span class="o">=</span> <span class="n">baseline</span><span class="p">[</span><span class="s">'TRANS_COST_PER_TRUCK_USD'</span><span class="p">]</span> <span class="o">/</span> <span class="n">baseline</span><span class="p">[</span><span class="s">'DISTANCE_MILES'</span><span class="p">]</span>
<span class="n">baseline</span><span class="o">.</span><span class="n">drop</span><span class="p">([</span><span class="s">'DISTANCE_MILES'</span><span class="p">],</span> <span class="mi">1</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">baseline_costs</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">test_features</span><span class="p">)</span>
<span class="n">baseline_costs</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">features_list</span>
<span class="n">baseline_costs</span> <span class="o">=</span> <span class="n">baseline_costs</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">baseline</span><span class="p">,</span> <span class="n">how</span><span class="o">=</span><span class="s">'left'</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="p">[</span><span class="s">'TRAILER_TYPE_DRY VAN'</span><span class="p">,</span> <span class="s">'TRAILER_TYPE_TEMPERATURE CONTROLLED'</span><span class="p">])</span>
<span class="n">baseline_costs</span><span class="p">[</span><span class="s">'BASELINE_TRANS_COST_PER_TRUCK_USD'</span><span class="p">]</span> <span class="o">=</span> <span class="n">baseline_costs</span><span class="p">[</span><span class="s">'DISTANCE_MILES'</span><span class="p">]</span> <span class="o">*</span> <span class="n">baseline_costs</span><span class="p">[</span><span class="s">'TRANS_COST_PER_TRUCK_USD_PER_MILE'</span><span class="p">]</span>
<span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ACTUAL_TRANS_COST_PER_TRUCK_USD'</span><span class="p">]</span> <span class="o">=</span> <span class="n">test_labels</span>
<span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ABSOLUTE_ERROR'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">baseline_costs</span><span class="p">[</span><span class="s">'BASELINE_TRANS_COST_PER_TRUCK_USD'</span><span class="p">]</span> <span class="o">-</span> <span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ACTUAL_TRANS_COST_PER_TRUCK_USD'</span><span class="p">])</span>
<span class="n">baseline_costs</span><span class="p">[</span><span class="s">'MAPE_PCT'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="o">*</span> <span class="p">(</span><span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ABSOLUTE_ERROR'</span><span class="p">]</span> <span class="o">/</span> <span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ACTUAL_TRANS_COST_PER_TRUCK_USD'</span><span class="p">])</span>
<span class="n">mean_absolute_error</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">baseline_costs</span><span class="p">[</span><span class="s">'ABSOLUTE_ERROR'</span><span class="p">]),</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="mi">100</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">baseline_costs</span><span class="p">[</span><span class="s">'MAPE_PCT'</span><span class="p">]),</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">baseline_costs</span><span class="p">,</span> <span class="n">mean_absolute_error</span><span class="p">,</span> <span class="n">accuracy</span></code></pre></figure>
<p>We calculate baseline performance metrics for FTL and LTL is as follows.
Those provide us goals which is model performance should be better than baseline performance.</p>
<table>
<thead>
<tr>
<th> </th>
<th>FTL Baseline</th>
<th>LTL Baseline</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>MAE</strong></td>
<td>434.46</td>
<td>352.96</td>
</tr>
<tr>
<td><strong>MAPE</strong></td>
<td>12.38%</td>
<td>66.9%</td>
</tr>
<tr>
<td><strong>Accuracy</strong></td>
<td>87.62%</td>
<td>33.1%</td>
</tr>
</tbody>
</table>
<h2 id="5-fit-model-and-measure-performance">5. Fit Model and Measure Performance</h2>
<p>We use the train data to fit the random forest model and the test data to measure model performance. We calculate
MAE, MAPE, and Accuracy similar to the baseline analysis.</p>
<p>The following functions are used to fit the random forest model and measure the model performance.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="kn">from</span> <span class="nn">sklearn.ensemble</span> <span class="kn">import</span> <span class="n">RandomForestRegressor</span>
<span class="k">def</span> <span class="nf">fit_random_forest_model</span><span class="p">(</span><span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">):</span>
<span class="s">'''
Fit a random forest model
'''</span>
<span class="n">random_forest</span> <span class="o">=</span> <span class="n">RandomForestRegressor</span><span class="p">(</span><span class="n">n_estimators</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">random_state</span> <span class="o">=</span> <span class="mi">42</span><span class="p">)</span>
<span class="n">random_forest</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">)</span>
<span class="k">return</span> <span class="n">random_forest</span>
<span class="k">def</span> <span class="nf">prediction_and_metrics</span><span class="p">(</span><span class="n">random_forest</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">test_labels</span><span class="p">):</span>
<span class="s">'''
Predict and measure the model performantce
'''</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">random_forest</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_features</span><span class="p">)</span>
<span class="n">mae</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">predictions</span> <span class="o">-</span> <span class="n">test_labels</span><span class="p">)))</span>
<span class="n">mape</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="mi">100</span> <span class="o">*</span> <span class="p">(</span><span class="n">mae</span> <span class="o">/</span> <span class="n">test_labels</span><span class="p">))</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="mi">100</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">mape</span><span class="p">)</span>
<span class="k">return</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">mae</span><span class="p">,</span> <span class="n">mape</span><span class="p">,</span> <span class="n">accuracy</span></code></pre></figure>
<p>The following table provides baseline and random forest model performance metrics for FTL and LTL.</p>
<table>
<thead>
<tr>
<th> </th>
<th>FTL Baseline</th>
<th>LTL Baseline</th>
<th>FTL Random Forest Model</th>
<th>LTL Random Forest Model</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>MAE</strong></td>
<td>434.46</td>
<td>352.96</td>
<td>188.0</td>
<td>123.0</td>
</tr>
<tr>
<td><strong>MAPE</strong></td>
<td>12.38%</td>
<td>66.9%</td>
<td>7.12%</td>
<td>33.10%</td>
</tr>
<tr>
<td><strong>Accuracy</strong></td>
<td>87.62%</td>
<td>33.1%</td>
<td>92.88%</td>
<td>66.90%</td>
</tr>
</tbody>
</table>
<p>Both FLT and LTL models beat the baseline prediction.
FTL model prediction has 94% accuracy. However,<br />
LTL model’s accuracy is 67% which is lower than FTL model.</p>
<p>One way to improve LTL model’s performance is hyperparameter tuning
where the model settings are adjusted to improve performance. Another way is to add more features to the
data set to capture behavior better.</p>
<h2 id="6-interpret-model-and-report-results">6. Interpret Model and Report Results</h2>
<p>We can use two methods to be able to understand how model calculates the values,</p>
<ol>
<li>Visualizing a random forest tree</li>
<li>Understanding feature importance of variables</li>
</ol>
<p><strong>Visualizing A Random Forest Tree</strong></p>
<p>The following Python code is used to visualize one of the random forest trees in the model.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">SVG</span>
<span class="kn">from</span> <span class="nn">graphviz</span> <span class="kn">import</span> <span class="n">Source</span>
<span class="kn">from</span> <span class="nn">sklearn.tree</span> <span class="kn">import</span> <span class="n">export_graphviz</span>
<span class="k">def</span> <span class="nf">create_random_forest_tree_image</span><span class="p">(</span><span class="n">random_forest_model</span><span class="p">,</span> <span class="n">tree_number</span><span class="p">,</span> <span class="n">features_list</span><span class="p">,</span> <span class="n">tree_file_name</span><span class="p">):</span>
<span class="s">'''
Create and save random forest tree image
'''</span>
<span class="n">tree_in_forest</span> <span class="o">=</span> <span class="n">random_forest_model</span><span class="o">.</span><span class="n">estimators_</span><span class="p">[</span><span class="n">tree_number</span><span class="p">]</span>
<span class="n">graph</span> <span class="o">=</span> <span class="n">Source</span><span class="p">(</span><span class="n">export_graphviz</span><span class="p">(</span><span class="n">tree_in_forest</span><span class="p">,</span> <span class="n">out_file</span><span class="o">=</span><span class="bp">None</span>
<span class="p">,</span> <span class="n">feature_names</span><span class="o">=</span><span class="n">features_list</span>
<span class="p">,</span> <span class="n">filled</span> <span class="o">=</span> <span class="bp">True</span><span class="p">))</span>
<span class="n">png_bytes</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">pipe</span><span class="p">(</span><span class="nb">format</span><span class="o">=</span><span class="s">'png'</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'{}.png'</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">tree_file_name</span><span class="p">),</span><span class="s">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">png_bytes</span><span class="p">)</span>
<span class="k">return</span> <span class="s">'tree is created'</span></code></pre></figure>
<p>Figure 11 shows one of the random forest trees in the model.
Since the tree is very large, it is hard to visualize the relationships between model variables.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/ftl_random_forest_whole_tree_5.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 11: Random forest tree</em></td>
</tr>
</tbody>
</table>
<p>We limit maximum depth to be 3 to be able to visualize a tree (see Figure 12).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/ftl_random_forest_small_tree_5.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 12: Random forest tree where maximum depth = 3</em></td>
</tr>
</tbody>
</table>
<p>Let’s
assume that we want to predict transportation cost per shipment with the following
shipment features.</p>
<table>
<thead>
<tr>
<th>Mode</th>
<th>Trailer Type</th>
<th>Weight Load (LBS)</th>
<th>Shipment Distance (Miles)</th>
</tr>
</thead>
<tbody>
<tr>
<td>FTL</td>
<td>Dry Van</td>
<td>17,404</td>
<td>341</td>
</tr>
</tbody>
</table>
<p>We follow the path in Figure 13 and predict FTL transportation cost per shipment as $1,277.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/ftl_random_forest_small_tree_5_level0.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 13: Predicting for FTL rate for a Dry Van, 341 miles distance, and 17,404 LBS weight with the model where maximum depth = 3</em></td>
</tr>
</tbody>
</table>
<p><strong>Understanding Feature Importance of Variables</strong></p>
<p>The relative importance is defined as how much including a particular variable improves the prediction.
We use the following code to calculate importance of model variables.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="k">def</span> <span class="nf">calculate_feature_importance</span><span class="p">(</span><span class="n">random_forest_model</span><span class="p">,</span> <span class="n">feature_list</span><span class="p">):</span>
<span class="s">'''
Calculate feature importance
'''</span>
<span class="n">importances</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">random_forest_model</span><span class="o">.</span><span class="n">feature_importances_</span><span class="p">)</span>
<span class="n">feature_importances</span> <span class="o">=</span> <span class="p">[(</span><span class="n">feature</span><span class="p">,</span> <span class="nb">round</span><span class="p">(</span><span class="n">importance</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="k">for</span> <span class="n">feature</span><span class="p">,</span> <span class="n">importance</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">feature_list</span><span class="p">,</span> <span class="n">importances</span><span class="p">)]</span>
<span class="n">feature_importances</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">feature_importances</span><span class="p">,</span> <span class="n">key</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">reverse</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="n">feature_importances</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">feature_importances</span><span class="p">)</span>
<span class="n">feature_importances</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s">'FEATURE'</span><span class="p">,</span> <span class="s">'IMPORTANCE'</span><span class="p">]</span>
<span class="k">return</span> <span class="n">feature_importances</span></code></pre></figure>
<p>Figure 14 shows feature importance for FTL and LTL. For FTL, miles travelled is the most important
factor for predicting transportation cost per shipment.
This is aligned with the observation we have before (see Figure 5). Miles travelled and weight per shipment
are the two most important variables affecting transportation cost per shipment for LTL.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_ftl_feature_importance.png" alt="_config.yml" /></th>
</tr>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_ltl_feature_importance.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 14: Feature importance for FLT and LTL</em></td>
</tr>
</tbody>
</table>
<h2 id="7-persist-model">7. Persist Model</h2>
<p>Model persistence is a technique where trained model is written or persisted to the disk.
And once you have your model saved on the disk, you can use it whenever you want.
After you read and load the file and get the trained model back that you can use for making predictions.
This is a very powerful technique because now you don’t have to train the model every time
in order to use the trained model.
You can persist your trained model once, and then you can use it later.
You can also share your training model with others without sharing the training data and all of the
steps to train the model (see Figure 15).</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/trans_rate_random_forest_model_persistance.png" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 15: Model persistence</em></td>
</tr>
</tbody>
</table>
<p>We use the following Python code to persist the random forest model.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="kn">import</span> <span class="nn">pickle</span>
<span class="k">def</span> <span class="nf">persist_model</span><span class="p">(</span><span class="n">random_forest_model</span><span class="p">,</span> <span class="n">feature_list</span><span class="p">,</span> <span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">test_labels</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>
<span class="s">'''
Persist model
'''</span>
<span class="n">tuple_objects</span> <span class="o">=</span> <span class="p">(</span><span class="n">random_forest_model</span><span class="p">,</span> <span class="n">feature_list</span><span class="p">,</span> <span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">test_labels</span><span class="p">)</span>
<span class="n">pickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">tuple_objects</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">'wb'</span><span class="p">))</span>
<span class="k">return</span> <span class="s">'model saved to '</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span></code></pre></figure>
<p>We load the saved model using the following Python function.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python">
<span class="k">def</span> <span class="nf">load_model</span><span class="p">(</span><span class="n">filename</span><span class="p">):</span>
<span class="s">'''
Load persisted model
'''</span>
<span class="n">random_forest_model</span><span class="p">,</span> <span class="n">feature_list</span><span class="p">,</span> <span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">test_labels</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">'rb'</span><span class="p">))</span>
<span class="k">return</span> <span class="n">random_forest_model</span><span class="p">,</span> <span class="n">feature_list</span><span class="p">,</span> <span class="n">train_features</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">,</span> <span class="n">test_features</span><span class="p">,</span> <span class="n">test_labels</span></code></pre></figure>Emrah Cimrencimren.1@gmail.comIn this post, we present a random forest model to predict short term trucking rates using Python.Visualizing Network Optimization Model Results Using Python2019-08-25T00:00:00+00:002019-08-25T00:00:00+00:00https://emrahcimren.github.io/visualization/Visualizing%20Network%20Optimization%20Model%20Results%20using%20Python<p>A quick way to validate network optimization model results is visually creating optimal flows map
which shows flows between source and destination.
This post explains how to create such visualizations using Python.</p>
<p><a href="https://emrahcimren.github.io/Greenfield-Analysis-with-Weighted-Clustering/">The Greenfield Algorithm</a>
uses customer locations with annual demand as an input and
calculates allocation of distribution centers to customers.
Distribution center and customer locations, and optimal flows maps can be used to
visualize model inputs and outputs. Figures 1 and 2 show these maps.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_point_map.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 1: Locations map</em></td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_flows_map.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 2: Optimal flows map</em></td>
</tr>
</tbody>
</table>
<p>Python can be used to create location and optimal flows maps quickly.
This helps modelers to validate model inputs and results.</p>
<h2 id="application">Application</h2>
<p>We use results from <a href="https://emrahcimren.github.io/Greenfield-Analysis-with-Weighted-Clustering/">the Greenfield analysis</a>
to build</p>
<ul>
<li>Locations map: Consists of distribution center and customer locations.</li>
<li>Optimal flows map: Consists of distribution center, customer locations, and flows between those points.</li>
</ul>
<p>In the Python code, we first initiate libraries and define colors and shapes lists. Then,
we read the results data.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">plotly</span>
<span class="kn">import</span> <span class="nn">plotly.graph_objs</span> <span class="k">as</span> <span class="n">go</span>
<span class="n">plotly</span><span class="o">.</span><span class="n">offline</span><span class="o">.</span><span class="n">init_notebook_mode</span><span class="p">()</span>
<span class="n">colors</span> <span class="o">=</span> <span class="p">[</span><span class="s">'rgb(0, 128, 155)'</span><span class="p">,</span> <span class="s">'rgb(255, 128, 0)'</span><span class="p">,</span> <span class="s">'rgb(191, 2, 2)'</span><span class="p">,</span> <span class="s">'rgb(0, 175, 181)'</span><span class="p">,</span> <span class="s">'rgb(0, 181, 78)'</span><span class="p">,</span> <span class="s">'rgb(181, 175, 0)'</span><span class="p">,</span> <span class="s">'rgb(130, 0, 181)'</span><span class="p">,</span> <span class="s">'rgb(230, 0, 195)'</span><span class="p">,</span> <span class="s">'rgb(201, 67, 0)'</span><span class="p">]</span>
<span class="n">shapes</span> <span class="o">=</span> <span class="p">[</span><span class="s">'circle'</span><span class="p">,</span> <span class="s">'triangle-down'</span><span class="p">,</span> <span class="s">'square'</span><span class="p">,</span> <span class="s">'diamond'</span><span class="p">,</span> <span class="s">'square'</span><span class="p">,</span> <span class="s">'cross'</span><span class="p">]</span>
<span class="n">algorithm_results</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">r'https://raw.githubusercontent.com/emrahcimren/Greenfield_Bluefield_With_Weighted_Kmeans/v1.1/data/results/customers_with_clusters.csv'</span><span class="p">)</span>
<span class="n">algorithm_results</span> <span class="o">=</span> <span class="n">algorithm_results</span><span class="p">[(</span><span class="n">algorithm_results</span><span class="p">[</span><span class="s">'NUMBER_OF_CLUSTERS'</span><span class="p">]</span><span class="o">==</span><span class="mi">9</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">algorithm_results</span><span class="p">[</span><span class="s">'ITERATION'</span><span class="p">]</span><span class="o">==</span><span class="mi">10</span><span class="p">)]</span>
<span class="n">filter_paths</span> <span class="o">=</span> <span class="p">(</span><span class="n">algorithm_results</span><span class="p">[</span><span class="s">'CLUSTER'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">3</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">algorithm_results</span><span class="p">[</span><span class="s">'CUSTOMER_NAME'</span><span class="p">]</span> <span class="o">==</span> <span class="s">'Customer 87'</span><span class="p">)</span>
<span class="n">algorithm_results</span> <span class="o">=</span> <span class="n">algorithm_results</span><span class="p">[</span><span class="o">~</span><span class="n">filter_paths</span><span class="p">]</span></code></pre></figure>
<h2 id="locations-map">Locations Map</h2>
<p>Locations map consists of base map and distribution center and customer locations.
We create location points as follows in Python.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">point_locations_customers</span> <span class="o">=</span> <span class="n">algorithm_results</span><span class="p">[[</span><span class="s">'CUSTOMER_NAME'</span><span class="p">,</span> <span class="s">'LATITUDE'</span><span class="p">,</span> <span class="s">'LONGITUDE'</span><span class="p">,</span> <span class="s">'DEMAND'</span><span class="p">]]</span><span class="o">.</span><span class="n">drop_duplicates</span><span class="p">()</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">'CUSTOMER_NAME'</span><span class="p">:</span> <span class="s">'LOCATION_NAME'</span><span class="p">,</span> <span class="s">'DEMAND'</span><span class="p">:</span> <span class="s">'LOCATION_WEIGHT'</span><span class="p">})</span>
<span class="n">point_locations_customers</span><span class="p">[</span><span class="s">'LOCATION_TYPE'</span><span class="p">]</span> <span class="o">=</span> <span class="s">'CUSTOMER'</span>
<span class="n">point_locations_customers</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT_FACTOR'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">30</span>
<span class="n">point_locations_customers</span><span class="p">[</span><span class="s">'ADJUST_MARKER_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">point_locations_warehouses</span> <span class="o">=</span> <span class="n">algorithm_results</span><span class="o">.</span><span class="n">groupby</span><span class="p">([</span><span class="s">'CLUSTER'</span><span class="p">,</span> <span class="s">'CLUSTER_LATITUDE'</span><span class="p">,</span> <span class="s">'CLUSTER_LONGITUDE'</span><span class="p">],</span> <span class="n">as_index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span><span class="o">.</span><span class="n">agg</span><span class="p">({</span><span class="s">'DEMAND'</span><span class="p">:</span> <span class="nb">sum</span><span class="p">})</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">'CLUSTER'</span><span class="p">:</span> <span class="s">'LOCATION_NAME'</span><span class="p">,</span> <span class="s">'CLUSTER_LATITUDE'</span><span class="p">:</span> <span class="s">'LATITUDE'</span><span class="p">,</span> <span class="s">'CLUSTER_LONGITUDE'</span><span class="p">:</span> <span class="s">'LONGITUDE'</span><span class="p">,</span> <span class="s">'DEMAND'</span><span class="p">:</span> <span class="s">'LOCATION_WEIGHT'</span><span class="p">})</span>
<span class="n">point_locations_warehouses</span><span class="p">[</span><span class="s">'LOCATION_TYPE'</span><span class="p">]</span> <span class="o">=</span> <span class="s">'DISTRIBUTION CENTER'</span>
<span class="n">point_locations_warehouses</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT_FACTOR'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">point_locations_warehouses</span><span class="p">[</span><span class="s">'ADJUST_MARKER_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">point_locations</span> <span class="o">=</span> <span class="n">point_locations_customers</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">point_locations_warehouses</span><span class="p">)</span></code></pre></figure>
<p>Figure 3 shows point locations data.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_point_locations_data.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 3: Point locations data from Python</em></td>
</tr>
</tbody>
</table>
<p>The following function adds marker size, color, and shape to each location point.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">add_shapes_and_colors_to_locations_for_visualization</span><span class="p">(</span><span class="n">locations</span><span class="p">,</span> <span class="n">colors</span><span class="p">,</span> <span class="n">shapes</span><span class="p">):</span>
<span class="s">'''
Function to add marker sizes, colors, and shapes to locations
:param locations:
:param colors: List of colors
:param shapes: List of shapes
:return: Updated locations
'''</span>
<span class="n">location_types</span> <span class="o">=</span> <span class="n">locations</span><span class="p">[</span><span class="s">'LOCATION_TYPE'</span><span class="p">]</span><span class="o">.</span><span class="n">unique</span><span class="p">()</span>
<span class="n">locations_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">idx_loc</span><span class="p">,</span> <span class="n">location_type</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">location_types</span><span class="p">):</span>
<span class="n">by_location_type</span> <span class="o">=</span> <span class="n">locations</span><span class="p">[</span><span class="n">locations</span><span class="p">[</span><span class="s">'LOCATION_TYPE'</span><span class="p">]</span> <span class="o">==</span> <span class="n">location_type</span><span class="p">]</span>
<span class="n">maximum_weight_factor</span> <span class="o">=</span> <span class="n">by_location_type</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT_FACTOR'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="o">/</span> <span class="n">by_location_type</span><span class="p">[</span>
<span class="s">'LOCATION_WEIGHT'</span><span class="p">]</span><span class="o">.</span><span class="nb">max</span><span class="p">()</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">location</span> <span class="ow">in</span> <span class="n">by_location_type</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="k">if</span> <span class="n">location</span><span class="p">[</span><span class="s">'ADJUST_MARKER_SIZE'</span><span class="p">]:</span>
<span class="n">marker_size</span> <span class="o">=</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT'</span><span class="p">]</span> <span class="o">*</span> <span class="n">maximum_weight_factor</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">marker_size</span> <span class="o">=</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT_FACTOR'</span><span class="p">]</span>
<span class="n">locations_list</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s">'LOCATION_NAME'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_NAME'</span><span class="p">],</span>
<span class="s">'LOCATION_TYPE'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_TYPE'</span><span class="p">],</span>
<span class="s">'HOVER_TEXT'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_NAME'</span><span class="p">],</span>
<span class="s">'LATITUDE'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LATITUDE'</span><span class="p">],</span>
<span class="s">'LONGITUDE'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LONGITUDE'</span><span class="p">],</span>
<span class="s">'LOCATION_WEIGHT'</span><span class="p">:</span> <span class="n">location</span><span class="p">[</span><span class="s">'LOCATION_WEIGHT'</span><span class="p">],</span>
<span class="s">'MARKER_SIZE'</span><span class="p">:</span> <span class="n">marker_size</span><span class="p">,</span>
<span class="s">'MARKER_COLOR'</span><span class="p">:</span> <span class="n">colors</span><span class="p">[</span><span class="n">idx_loc</span><span class="p">],</span>
<span class="s">'MARKER_SHAPE'</span><span class="p">:</span> <span class="n">shapes</span><span class="p">[</span><span class="n">idx_loc</span><span class="p">]</span>
<span class="p">})</span>
<span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_records</span><span class="p">(</span><span class="n">locations_list</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">point_locations</span> <span class="o">=</span> <span class="n">add_shapes_and_colors_to_locations_for_visualization</span><span class="p">(</span><span class="n">point_locations</span><span class="p">,</span> <span class="n">colors</span><span class="p">,</span> <span class="n">shapes</span><span class="p">)</span></code></pre></figure>
<p>We use the following visualization function to create maps from location points.
In the function, we define point locations using latitude and longitudes and map layout.
Resulting location map is saved to an .hmtl file. Figure 4 shows the location map output.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">visualize_points_and_flows</span><span class="p">(</span><span class="n">point_locations</span><span class="p">,</span> <span class="n">paths</span><span class="p">,</span> <span class="n">map_title</span><span class="p">,</span> <span class="n">scope</span><span class="p">,</span> <span class="n">output_html_file</span><span class="p">):</span>
<span class="s">'''
Function to visualize points and flows
:param point_locations: Point to be visualized with latitude and longitude
:param paths: From to flows
:param map_title: Title
:param scope: Region name; europe, north america
:param output_html_file: name of the output file
:return:
'''</span>
<span class="n">locations</span> <span class="o">=</span> <span class="p">[</span><span class="nb">dict</span><span class="p">(</span>
<span class="nb">type</span><span class="o">=</span><span class="s">'scattergeo'</span><span class="p">,</span>
<span class="n">locationmode</span><span class="o">=</span><span class="s">'country names'</span><span class="p">,</span>
<span class="n">lon</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'LONGITUDE'</span><span class="p">],</span>
<span class="n">lat</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'LATITUDE'</span><span class="p">],</span>
<span class="n">hoverinfo</span><span class="o">=</span><span class="s">'text'</span><span class="p">,</span>
<span class="n">text</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'HOVER_TEXT'</span><span class="p">],</span>
<span class="n">mode</span><span class="o">=</span><span class="s">'markers'</span><span class="p">,</span>
<span class="n">marker</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">size</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'MARKER_SIZE'</span><span class="p">],</span>
<span class="n">color</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'MARKER_COLOR'</span><span class="p">],</span>
<span class="n">symbol</span><span class="o">=</span><span class="n">point_locations</span><span class="p">[</span><span class="s">'MARKER_SHAPE'</span><span class="p">],</span>
<span class="n">line</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">width</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="n">color</span><span class="o">=</span><span class="s">'rgba(68, 68, 68, 0)'</span>
<span class="p">),</span>
<span class="p">))]</span>
<span class="n">layout</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="n">map_title</span><span class="p">,</span>
<span class="n">titlefont</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">30</span><span class="p">),</span>
<span class="n">showlegend</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">autosize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">hovermode</span><span class="o">=</span><span class="s">'closest'</span><span class="p">,</span>
<span class="n">geo</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">scope</span><span class="o">=</span><span class="n">scope</span><span class="p">,</span>
<span class="n">showframe</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">projection</span><span class="o">=</span><span class="n">go</span><span class="o">.</span><span class="n">layout</span><span class="o">.</span><span class="n">geo</span><span class="o">.</span><span class="n">Projection</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s">'azimuthal equal area'</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">15</span><span class="p">),</span>
<span class="n">center</span><span class="o">=</span><span class="p">{</span><span class="s">'lat'</span><span class="p">:</span> <span class="n">point_locations</span><span class="p">[</span><span class="s">'LATITUDE'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">(),</span> <span class="s">'lon'</span><span class="p">:</span> <span class="n">point_locations</span><span class="p">[</span><span class="s">'LONGITUDE'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()},</span>
<span class="n">showland</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">landcolor</span><span class="o">=</span><span class="s">'rgb(243, 243, 243)'</span><span class="p">,</span>
<span class="n">countrycolor</span><span class="o">=</span><span class="s">'rgb(204, 204, 204)'</span><span class="p">,</span>
<span class="n">showcountries</span><span class="o">=</span><span class="bp">True</span>
<span class="p">),</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">paths</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">plotly</span><span class="o">.</span><span class="n">offline</span><span class="o">.</span><span class="n">plot</span><span class="p">({</span><span class="s">"data"</span><span class="p">:</span> <span class="n">locations</span><span class="p">,</span> <span class="s">"layout"</span><span class="p">:</span> <span class="n">layout</span><span class="p">},</span> <span class="n">filename</span><span class="o">=</span><span class="s">'{}.html'</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">output_html_file</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">plotly</span><span class="o">.</span><span class="n">offline</span><span class="o">.</span><span class="n">plot</span><span class="p">({</span><span class="s">"data"</span><span class="p">:</span> <span class="n">locations</span> <span class="o">+</span> <span class="n">paths</span><span class="p">,</span> <span class="s">"layout"</span><span class="p">:</span> <span class="n">layout</span><span class="p">},</span>
<span class="n">filename</span><span class="o">=</span><span class="s">'{}.html'</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">output_html_file</span><span class="p">))</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">visualize_points_and_flows</span><span class="p">(</span><span class="n">point_locations</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="s">'Distribution Center and Customer Locations with Demand'</span><span class="p">,</span> <span class="s">'europe'</span><span class="p">,</span> <span class="s">'point_visualization'</span><span class="p">)</span></code></pre></figure>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_output_location_maps.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 4: Location map</em></td>
</tr>
</tbody>
</table>
<h2 id="optimal-flows-map">Optimal Flows Map</h2>
<p>We visualize source-destination flows using the optimal flows map.
Source-destination flows is created from
<a href="https://emrahcimren.github.io/Greenfield-Analysis-with-Weighted-Clustering/">the Greenfield analysis</a>
results as in Figure 5.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">flows</span> <span class="o">=</span> <span class="n">algorithm_results</span><span class="p">[[</span><span class="s">'CLUSTER'</span><span class="p">,</span> <span class="s">'CLUSTER_LATITUDE'</span><span class="p">,</span> <span class="s">'CLUSTER_LONGITUDE'</span><span class="p">,</span> <span class="s">'CUSTOMER_NAME'</span><span class="p">,</span> <span class="s">'LATITUDE'</span><span class="p">,</span> <span class="s">'LONGITUDE'</span><span class="p">,</span> <span class="s">'WEIGHTED_DISTANCE'</span><span class="p">]]</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">'CUSTOMER_NAME'</span><span class="p">:</span> <span class="s">'DESTINATION_NAME'</span><span class="p">,</span> <span class="s">'LATITUDE'</span><span class="p">:</span> <span class="s">'DESTINATION_LATITUDE'</span><span class="p">,</span> <span class="s">'LONGITUDE'</span><span class="p">:</span> <span class="s">'DESTINATION_LONGITUDE'</span><span class="p">,</span> <span class="s">'CLUSTER'</span><span class="p">:</span> <span class="s">'SOURCE_NAME'</span><span class="p">,</span> <span class="s">'CLUSTER_LATITUDE'</span><span class="p">:</span> <span class="s">'SOURCE_LATITUDE'</span><span class="p">,</span> <span class="s">'CLUSTER_LONGITUDE'</span><span class="p">:</span> <span class="s">'SOURCE_LONGITUDE'</span><span class="p">,</span> <span class="s">'WEIGHTED_DISTANCE'</span><span class="p">:</span> <span class="s">'PATH_WEIGHT'</span><span class="p">})</span></code></pre></figure>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_flows_data.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 5: Flows data</em></td>
</tr>
</tbody>
</table>
<p>Marker colors in the location data is updated using the following function.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">update_locations_colors_for_flow_visualization</span><span class="p">(</span><span class="n">flows</span><span class="p">,</span> <span class="n">locations</span><span class="p">,</span> <span class="n">colors</span><span class="p">):</span>
<span class="s">'''
Function to update colors for the flow map
:param flows: From to locations
:param locations: Point locations
:param colors: Plot colors
:return: Update locations and mapped colors to sources
'''</span>
<span class="n">color_base_column</span> <span class="o">=</span> <span class="s">'LOCATION_TYPE'</span>
<span class="n">color_base_value</span> <span class="o">=</span> <span class="s">'DISTRIBUTION CENTER'</span>
<span class="n">color_bases</span> <span class="o">=</span> <span class="n">locations</span><span class="p">[</span><span class="n">locations</span><span class="p">[</span><span class="n">color_base_column</span><span class="p">]</span> <span class="o">==</span> <span class="n">color_base_value</span><span class="p">]</span>
<span class="n">color_bases</span> <span class="o">=</span> <span class="n">color_bases</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s">'LOCATION_NAME'</span><span class="p">])</span>
<span class="n">color_bases</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span>
<span class="p">{</span><span class="s">'SOURCE_NAME'</span><span class="p">:</span> <span class="n">color_bases</span><span class="p">[</span><span class="s">'LOCATION_NAME'</span><span class="p">],</span> <span class="s">'MARKER_COLOR'</span><span class="p">:</span> <span class="n">colors</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="n">color_bases</span><span class="p">[</span><span class="s">'LOCATION_NAME'</span><span class="p">])]})</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">color_base</span> <span class="ow">in</span> <span class="n">color_bases</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">by_flow</span> <span class="o">=</span> <span class="n">flows</span><span class="p">[</span><span class="n">flows</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]</span> <span class="o">==</span> <span class="n">color_base</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]]</span>
<span class="n">by_flow_list</span> <span class="o">=</span> <span class="n">by_flow</span><span class="p">[</span><span class="s">'DESTINATION_NAME'</span><span class="p">]</span><span class="o">.</span><span class="n">tolist</span><span class="p">()</span> <span class="o">+</span> <span class="p">[</span><span class="n">color_base</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]]</span>
<span class="n">filter_paths</span> <span class="o">=</span> <span class="n">locations</span><span class="p">[</span><span class="s">'LOCATION_NAME'</span><span class="p">]</span><span class="o">.</span><span class="n">isin</span><span class="p">(</span><span class="n">by_flow_list</span><span class="p">)</span>
<span class="n">locations</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">filter_paths</span><span class="p">,</span> <span class="s">'MARKER_COLOR'</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_base</span><span class="p">[</span><span class="s">'MARKER_COLOR'</span><span class="p">]</span>
<span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_records</span><span class="p">(</span><span class="n">locations</span><span class="p">),</span> <span class="n">color_bases</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">point_locations</span><span class="p">,</span> <span class="n">color_bases</span> <span class="o">=</span> <span class="n">update_locations_colors_for_flow_visualization</span><span class="p">(</span><span class="n">flows</span><span class="p">,</span> <span class="n">point_locations</span><span class="p">,</span> <span class="n">colors</span><span class="p">)</span></code></pre></figure>
<p>After updating marker colors in the location data, we also add marker colors to the flows data.
Flows data is used generate path layer to the optimal flows map.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">flows</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]</span> <span class="o">=</span> <span class="n">flows</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">color_bases</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_bases</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">flows</span> <span class="o">=</span> <span class="n">flows</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">color_bases</span><span class="p">,</span> <span class="n">how</span><span class="o">=</span><span class="s">'left'</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">])</span>
<span class="n">flows</span><span class="p">[</span><span class="s">'PATH_WEIGHT_FACTOR'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">5</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">create_paths</span><span class="p">(</span><span class="n">flows</span><span class="p">):</span>
<span class="s">'''
Path layer to visualize source-destination flows on the map
:param flows:
:return: paths
'''</span>
<span class="n">maximum_weight_factor</span> <span class="o">=</span> <span class="n">flows</span><span class="p">[</span><span class="s">'PATH_WEIGHT_FACTOR'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="o">/</span> <span class="n">flows</span><span class="p">[</span><span class="s">'PATH_WEIGHT'</span><span class="p">]</span><span class="o">.</span><span class="nb">max</span><span class="p">()</span>
<span class="n">paths</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">from_to_flow</span> <span class="ow">in</span> <span class="n">flows</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">paths</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="nb">dict</span><span class="p">(</span>
<span class="nb">type</span><span class="o">=</span><span class="s">'scattergeo'</span><span class="p">,</span>
<span class="n">locationmode</span><span class="o">=</span><span class="s">'country names'</span><span class="p">,</span>
<span class="n">text</span><span class="o">=</span><span class="s">'from {} to {}'</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">from_to_flow</span><span class="p">[</span><span class="s">'SOURCE_NAME'</span><span class="p">],</span> <span class="n">from_to_flow</span><span class="p">[</span><span class="s">'DESTINATION_NAME'</span><span class="p">]),</span>
<span class="n">lon</span><span class="o">=</span><span class="p">[</span><span class="n">from_to_flow</span><span class="p">[</span><span class="s">'SOURCE_LONGITUDE'</span><span class="p">],</span> <span class="n">from_to_flow</span><span class="p">[</span><span class="s">'DESTINATION_LONGITUDE'</span><span class="p">]],</span>
<span class="n">lat</span><span class="o">=</span><span class="p">[</span><span class="n">from_to_flow</span><span class="p">[</span><span class="s">'SOURCE_LATITUDE'</span><span class="p">],</span> <span class="n">from_to_flow</span><span class="p">[</span><span class="s">'DESTINATION_LATITUDE'</span><span class="p">]],</span>
<span class="n">mode</span><span class="o">=</span><span class="s">'lines'</span><span class="p">,</span>
<span class="n">line</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">width</span><span class="o">=</span><span class="n">from_to_flow</span><span class="p">[</span><span class="s">'PATH_WEIGHT'</span><span class="p">]</span> <span class="o">*</span> <span class="n">maximum_weight_factor</span><span class="p">,</span>
<span class="n">color</span><span class="o">=</span><span class="n">from_to_flow</span><span class="p">[</span><span class="s">'MARKER_COLOR'</span><span class="p">],</span>
<span class="p">),</span>
<span class="n">opacity</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">paths</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">paths</span> <span class="o">=</span> <span class="n">create_paths</span><span class="p">(</span><span class="n">flows</span><span class="p">)</span></code></pre></figure>
<p>Finally, the following function creates the optimal flows map as in Figure 6.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">visualize_points_and_flows</span><span class="p">(</span><span class="n">point_locations</span><span class="p">,</span> <span class="n">paths</span><span class="p">,</span> <span class="s">'Optimal Flows'</span><span class="p">,</span> <span class="s">'europe'</span><span class="p">,</span> <span class="s">'flow_visualization'</span><span class="p">)</span></code></pre></figure>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/network_visualization_output_flow_maps.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 6: Optimal flows map</em></td>
</tr>
</tbody>
</table>Emrah Cimrencimren.1@gmail.comA quick way to validate network optimization model results is visually creating optimal flows map which shows flows between source and destination. This post explains how to create such visualizations using Python.Weighted Clustering with Minimum-Maximum Cluster Sizes, Greenfield Analysis2019-08-23T00:00:00+00:002019-08-23T00:00:00+00:00https://emrahcimren.github.io/data%20science/Greenfield%20Analysis%20with%20Weighted%20Clustering<p>This post provides a center of gravity based algorithm for a greenfield analysis.
Algorithm is based on k-means clustering enhanced with optimization.</p>
<p><img src="/images/2019-7-30_warehouse.jpg" alt="_config.yml" /></p>
<h2 id="greenfield-analysis">Greenfield Analysis</h2>
<p>Greenfield analysis is a quick way to identify optimal distribution center locations for a given demand network.
The analysis answers the following questions:</p>
<ul>
<li>Where should distribution centers be geographically located to minimize cost?</li>
<li>Which customers will be supplied from each distribution center?</li>
</ul>
<h2 id="problem">Problem</h2>
<p>We consider a network of customers where demand for each customer is satisfied by a distribution center.
Figure 1 shows the customer locations and corresponding annual demand.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/2019-7-30_customer_demand_points.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 1: Customer demand map</em></td>
</tr>
</tbody>
</table>
<p>It is assumed that any location can be selected for a distribution center location.</p>
<p>We answer the following questions as a result of the analysis:</p>
<ul>
<li>How many distribution centers are required?</li>
<li>Where should each distribution center be located?</li>
<li>How customers should be allocated to the distribution centers?</li>
</ul>
<h2 id="algorithm">Algorithm</h2>
<p>Algorithm is based on the center of gravity approach and selects the each distribution center locations such
that total weighted distance to customers is minimized.</p>
<p>We first provide the following definitions related to the distribution network.</p>
<p>Let <script type="math/tex">n</script> be the total number of distribution centers.
Let <script type="math/tex">D=\{1,\dots,n\}</script> be the set of distribution centers and <script type="math/tex">\hat{D}_i</script> be the set of customers allocated
to the distribution center <script type="math/tex">i\in D</script>. Each
distribution center <script type="math/tex">i\in D</script> has latitude and longitude, <script type="math/tex">\phi^{d}_i</script> and <script type="math/tex">\lambda^{d}_i</script>, respectively.
Let <script type="math/tex">\alpha_i</script> be the maximum distance covered by the center <script type="math/tex">i \in D</script>.
Let <script type="math/tex">u^{-}_i</script> and <script type="math/tex">u^{+}_i</script> be the minimum and maximum number of customers can be allocated to center <script type="math/tex">i \in D</script>,
respectively.</p>
<p>Let <script type="math/tex">C</script> be the set of customers. Similar to distribution centers, each customer <script type="math/tex">j\in C</script> has latitude and longitude,
<script type="math/tex">\phi^{c}_j</script> and <script type="math/tex">\lambda^{c}_j</script>, respectively.
Let <script type="math/tex">d_j</script> be the total demand for the customer <script type="math/tex">j\in C</script>.</p>
<p>Let <script type="math/tex">\Delta_k</script> be the total weighted distance at iteration <script type="math/tex">k</script>.</p>
<p>Let <script type="math/tex">a_{ij}</script> be the distance from the center <script type="math/tex">i \in D</script> to customer <script type="math/tex">j \in C</script>.
Each <script type="math/tex">a_{ij}</script> is defined using the Haversine distance formula which calculates
the shortest distance between two points on a sphere using their latitudes and
longitudes measured along the surface. You can find more detailed explanation at
the <a href="https://en.wikipedia.org/wiki/Haversine_formula">Wikipedia page</a>.</p>
<p>The algorithm consists of the following steps:</p>
<p><strong>Step 0</strong>: Set <script type="math/tex">k=0</script>. For given <script type="math/tex">D=\{1, \dots, n\}</script>,
create <script type="math/tex">\hat{D}_i</script> randomly.
Let <script type="math/tex">\Delta_0 = \sum_{i\in D}\sum_{j\in \hat{D}_i}d_ja_{ij}</script>.</p>
<p><strong>Step 1</strong>: Set <script type="math/tex">k=k+1</script>. For each <script type="math/tex">i \in D</script>, <script type="math/tex">j \in C</script>,
<script type="math/tex">u^{-}_i</script>, <script type="math/tex">u^{+}_i</script>, <script type="math/tex">\alpha_i</script>, and <script type="math/tex">a_{ij}</script>,
run a binary allocation model to determine <script type="math/tex">\hat{D}_i</script>.
Let <script type="math/tex">\Delta_k = \sum_{i\in D}\sum_{j\in \hat{D}_i}d_ja_{ij}</script>.</p>
<p><strong>Step 2</strong>: If <script type="math/tex">\Delta_k \ge \Delta_{k-1}</script>, then stop.
Otherwise, set <script type="math/tex">\phi^{d}_i = \frac{\sum_{j \in \hat{D}_i} \phi^{c}_j}{|D_i|}</script>
and <script type="math/tex">\lambda^{d}_i = \frac{\sum_{j \in \hat{D}_i} \lambda^{c}_j}{|D_i|}</script> <script type="math/tex">\forall i\in D</script>.
Repeat Steps 1-2 until <script type="math/tex">\Delta_k \ge \Delta_{k-1}</script>.</p>
<h2 id="allocation-of-customers-to-distribution-centers">Allocation of Customers to Distribution Centers</h2>
<p>Once cluster centers are determined, we apply a binary model to allocate customers to distribution centers.</p>
<p>In addition to parameters defined in the Algorithm section,
let <script type="math/tex">y_{ij}</script> be the binary variable for assigning center <script type="math/tex">i \in D</script> to customer <script type="math/tex">j \in C</script>.</p>
<p>The following is the binary program for allocating customers to distribution centers.</p>
<p><img src="/images/2019-7-30_allocation_model.jpg" alt="_config.yml" /></p>
<p>The objective of the model <script type="math/tex">(1)</script> minimizes the total weighted distance.
Constraint <script type="math/tex">(2)</script> ensures that each customer is allocated to a center.
There exists a maximum distance can be covered by each center <script type="math/tex">(3)</script>.
Each center has minimum and maximum number of allocated customers as in <script type="math/tex">(4)</script> and <script type="math/tex">(5)</script>, respectively.
Center to customer allocation is binaries as in <script type="math/tex">(6)</script>.</p>
<h2 id="implementation">Implementation</h2>
<p>The algorithm is implemented in Python. <a href="https://developers.google.com/optimization/">Google OR Tools</a>
is used to solve the allocation problem.</p>
<p>You can find the source code at the
<a href="https://github.com/emrahcimren/Greenfield_Bluefield_With_Weighted_Kmeans/tree/v1.0">Greenfield_With_Weighted_Kmeans repository</a>
on GitHub.</p>
<h2 id="application">Application</h2>
<p>The algorithm is applied on the given problem.
We iterate the algorithm for <script type="math/tex">n=3,\dots,19</script>. Figure 2 shows run results.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/2019-7-30_model_cluster_objectives.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 2: Results for <script type="math/tex">n=3,\dots,19</script></em></td>
</tr>
</tbody>
</table>
<p>Total weighted distance decreases as the number of clusters increases. More specifically,
From <script type="math/tex">n=3</script> to <script type="math/tex">n=7</script>, the total weighted distance is reduced by 39%.
We find that <script type="math/tex">n=9</script> is the best configuration since objective function improves slightly for <script type="math/tex">n>9</script> and
opening a new distribution center is costly.</p>
<p>Figure 3 shows iteration steps for <script type="math/tex">n=9</script>. Algorithm converges quickly in the first three
iterations and the total weighted distance reduces by <script type="math/tex">62\%</script>.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/2019-7-30_allocation_map_for_4_cluster.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 3: Algorithm iterations for <script type="math/tex">n=9</script></em></td>
</tr>
</tbody>
</table>
<p>Allocation of customers to distribution centers is shown in Figure 4 for <script type="math/tex">n=9</script> and <script type="math/tex">k=10</script>.
Each distribution center covers customers in average <script type="math/tex">300</script> miles radius.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/2019-7-30_allocation_map.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 4: Optimal flow map for <script type="math/tex">n=9</script> and <script type="math/tex">k=10</script></em></td>
</tr>
</tbody>
</table>
<p>Figure 5 shows cluster statistics for <script type="math/tex">n=9</script>. We bound cluster size as <script type="math/tex">\mp</script> 20%
of average cluster size which is minimum <script type="math/tex">16</script> and maximum <script type="math/tex">25</script> for <script type="math/tex">n=9</script>.
Cluster 8 average weighted distance is the lowest among all other clusters as well as the average distance.</p>
<table>
<thead>
<tr>
<th style="text-align: center"><img src="/images/2019-7-30_cluster_statistics.jpg" alt="_config.yml" /></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><em>Figure 5: Cluster statistics for <script type="math/tex">n=9</script></em></td>
</tr>
</tbody>
</table>
<h2 id="future-work">Future Work</h2>
<p>The Algorithm can be generalized to start with given <script type="math/tex">\hat{D}_i</script> <script type="math/tex">\forall i\in D</script> instead of
creating <script type="math/tex">\hat{D}_i</script> randomly. This will help to improve solution quality as well as to cover
broader network design problems than the greenfield analysis.</p>Emrah Cimrencimren.1@gmail.comThis post provides a center of gravity based algorithm for a greenfield analysis. Algorithm is based on k-means clustering enhanced with optimization.Hello2019-08-22T00:00:00+00:002019-08-22T00:00:00+00:00https://emrahcimren.github.io/hello/hello<p>First post…</p>Emrah Cimrencimren.1@gmail.comFirst post…