http://ewinnington.github.io/Eric Winnington20242024-03-20T13:01:41ZA collection of thoughts, code and snippets.http://ewinnington.github.io/posts/HttpClientCompressionReceiving compressed data from an http(s) endpoint2024-03-20T11:00:00Z<p>With the amount of data that we are passing around across services, it is often beneficial to use compression on the data to reduce the transmission time. Modern platforms and algorithms are now very efficient at compressing regular data, particularly if that data is text or json data. </p>
<p>If the developer of the endpoint has prepared their service for compression, the client must still indicate that they are ready to receive the compressed data. Luckily, most implementations of modern http clients in R, Python, JavaScript and Dotnet support compression / decompression and are seamless for the client. This means that you can set the compression headers on and simply benefit from compressed data being received. </p>
<p>We can also check in the Content-Encoding header which compression was used. I've found that example.com is sending responses compressed with gzip.</p>
<h2 id="python">Python</h2>
<pre><code class="language-python">import requests
url = "http://example.com" # Replace with the actual URL you want to request
# Specify the accepted encoding methods in the headers
headers = {
'Accept-Encoding': 'gzip, br',
}
response = requests.get(url, headers=headers)
print(response.text)
# In case you want to see if it was compressed, you can check via the headers
#if 'Content-Encoding' in response.headers:
# print(f"Response was compressed using: {response.headers['Content-Encoding']}")
#else:
# print("Response was not compressed.")
</code></pre>
<h2 id="r">R</h2>
<pre><code class="language-R">library(httr)
# The URL to which you're sending the request
url <- "http://example.com"
# Setting the Accept-Encoding header
response <- GET(url, add_headers(`Accept-Encoding` = 'gzip, br'))
# The content of the response will be automatically decompressed by httr, so you can access it directly.
content(response, "text")
</code></pre>
<h2 id="c">C#</h2>
<p>In C#, for some ungodly strange reason, the standard HTTP endpoint doesn't decompress for you automatically unless you add a decompression handler - see handler <a href="https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclienthandler.automaticdecompression?view=net-8.0#system-net-http-httpclienthandler-automaticdecompression">HttpClientHandler.AutomaticDecompression</a></p>
<pre><code class="language-Csharp">using System;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;
using System.Text;
class Program
{
static async Task Main(string[] args)
{
HttpClientHandler handler = new HttpClientHandler();
handler.AutomaticDecompression = System.Net.DecompressionMethods.GZip; //Adding automatic Decompression means that the accept headers are added automatically
using (var client = new HttpClient(handler))
{
string url = "http://example.com";
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
Console.WriteLine(await response.Content.ReadAsStringAsync());
}
}
}
</code></pre>
<h2 id="javascript">JavaScript</h2>
<p>it is so easy that you don't even need to do anything else than setting the gzip: true for the support</p>
<pre><code class="language-JS">const request = require('request');
const requestOptions = {
url: 'http://example.com',
gzip: true, // This is all that is required
};
request(requestOptions, (error, response, body) => {
// Handle the response here
});
</code></pre>
<p>With the amount of data that we are passing around across services, it is often beneficial to use compression on the data to reduce the transmission time. Modern platforms and algorithms are now very efficient at compressing regular data, particularly if that data is text or json data. </p>http://ewinnington.github.io/posts/devcontainersDevContainers - The future of developer environments2023-07-24T00:00:00Z<h2 id="history">History</h2>
<p>It's been years now that we've had Infrastructure as Code (IaC), Containers and Desired state Configuration (DsC) tools to do our deployments. But these have been mostly focused on the deployment side of things, with fewer tools on the developer side. On the dev machine, installing and maintaining the development tools and package dependencies has been in flux, both in windows where finally tools like Ninite, Chocolatey and Winget allow management of dev tools, and on the linux side, which was always quite well served with apt - but has also gained Snap, Flatpack and other package management tools. The thing is, sometimes you need more that one version of a particular tool, Python3.10 and Python3.11, Java9 and Java17, Dotnet 4.8 and Dotnet 6, to work on the various projects you have during the day. Sometimes, they work side by side very well and sometimes they don't. And when they don't, it can be a long process to figure out why and also very difficult to get help without resorting to having a clean image refresh and starting again to install your dependencies.</p>
<p>Since the end of the 2010s and the early 2020s, with the rise of web hosted IDEs, there has been a need to define ways to have a base image that contained the environment and tools needed to work. I remember running some in the mid 2010s - Nitrous.IO (2013-16) - that allowed you to use a base container and configure it to do remote development.</p>
<h2 id="devcontainers">DevContainers</h2>
<p>With the arrival of Docker on every desktop, Github's Cloudspaces and Visual Studio Code, there's been a new interest in this type of desired state environments with developer tooling. Microsoft published the <a href="https://containers.dev/">DevContainer specification</a> in early 2022 to formalize the language.</p>
<p>So how does it help us? Well, with a DevContainer, we can setup a new development environment on Premise (in VSCode), on the cloud VM (Azure+VM) or on a Codespace environment with a single file that ensures that we always have the tools we want and need installed. Starting to work is as easy as openining the connection and cloning the repo we need if the .devcontainer file is located inside.</p>
<h2 id="devcontainer-example">DevContainer example</h2>
<p>You can find below my <a href="https://github.com/ewinnington/DevContainerTemplate/blob/master/.devcontainer/devcontainer.json">personal DevContainer</a>, it is setup with Git, Node, AzureCLI, Docker control of hose, Dotnet, Terraform, Java with Maven, Python3 and Postgresql. I also have the VSCode extensions directly configured so I can directly start using them when I connect. I also use the "postStartCommand": "nohup bash -c 'postgres &'" to run an instance of Postgresql directly inside the development container, so I can a directly have a DB to run requests against. And yes, this is a bit of a kitchen sink DevContainer, they can be smaller and more tailored to a project with only one or two of these features included, but here I use a generic one add added everything I use apart from the c++ and fortran compilers.</p>
<pre><code>{
"name": "Erics-base-dev-container",
"image": "mcr.microsoft.com/devcontainers/base:debian",
"features": {
"ghcr.io/devcontainers/features/git:1": {},
"ghcr.io/devcontainers/features/node:1": {},
"ghcr.io/devcontainers/features/azure-cli:1": {}, //azure-cli,
"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {}, //docker on host
"ghcr.io/devcontainers/features/dotnet:1": {}, //dotnet installed
"ghcr.io/devcontainers/features/terraform:1": {},
"ghcr.io/devcontainers/features/java:1": { "installMaven" : true },
"ghcr.io/devcontainers-contrib/features/postgres-asdf:1": {}
},
// Configure tool-specific properties.
"customizations": {
// Configure properties specific to VS Code.
"vscode": {
"settings": {},
"extensions": [
"streetsidesoftware.code-spell-checker",
"ms-azuretools.vscode-docker",
"ms-dotnettools.csharp",
"HashiCorp.terraform",
"ms-azuretools.vscode-azureterraform",
"GitHub.copilot",
"GitHub.copilot-chat",
"vscjava.vscode-java-pack",
"ms-python.python"
]
}
},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [3000],
// Use 'portsAttributes' to set default properties for specific forwarded ports.
// More info: https://containers.dev/implementors/json_reference/#port-attributes
"portsAttributes": {
"3000": {
"label": "Hello Remote World",
"onAutoForward": "notify"
}
},
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "",
"postStartCommand": "nohup bash -c 'postgres &'"
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
</code></pre>
<h2 id="so-how-do-you-start-with-devcontainers">So how do you start with DevContainers?</h2>
<p>There are 2 easy ways:</p>
<ol>
<li>(remote) Github Codespaces
By going to my repo, you can click "Create Codespace on Master" and get a running VSCode in the cloud with all those tools setup instantly.</li>
</ol>
<p>(at first build, the image might take time)</p>
<ol start="2">
<li>(local) Docker + VS Code
Ensure you have the <a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers">ms-vscode-remote.remote-containers</a> extension installed in VS Code and Docker installed.</li>
</ol>
<p>Clone the repo <a href="https://github.com/ewinnington/DevContainerTemplate.git">https://github.com/ewinnington/DevContainerTemplate.git</a>, then open it with VSCode. It should automatically detect the .devContainer and offer to rebuild the container image and open it up in the IDE for you.</p>
<p>Once that is done, you should have access to a complete environment at the state you specified.</p>
<h2 id="whats-the-use-for-developers-at-corporations-where-computers-are-locked-down">What's the use for Developers at corporations where computers are locked down?</h2>
<p>I think that providing developer windows machine with Git, Docker, WSL2 installed and using VS Code or another IDE that supports DevContainers is an excellent way forwards in providing a good fast and stable environment for developers to work faster and more efficiently. Using this configuration, any person showing up to a Hackathon would be able to start working in minutes after cloning a repository. It would really simplify daily operations, since every repo can provide the correct .DevContainer configuration, or teams can share a DevContainer basic configuration.</p>
<p>This all simplifies operations, makes developer experience more consistent and increases productivity since you can move faster from one development environment to another in minutes. OnPrem → Remote VM → Cloudspace and back in minutes, without any friction.</p>
<p>All in all, I'm convinced it is a tool that both IT support must understand and master how to best provide access to, and for developers to understand the devContainer to benefit from it.</p>
<p>Have you used DevContainers? What is your experience?</p>
<p>It's been years now that we've had Infrastructure as Code (IaC), Containers and Desired state Configuration (DsC) tools to do our deployments. But these have been mostly focused on the deployment side of things, with fewer tools on the developer side. On the dev machine, installing and maintaining the development tools and package dependencies has been in flux, both in windows where finally tools like Ninite, Chocolatey and Winget allow management of dev tools, and on the linux side, which was always quite well served with apt - but has also gained Snap, Flatpack and other package management tools. The thing is, sometimes you need more that one version of a particular tool, Python3.10 and Python3.11, Java9 and Java17, Dotnet 4.8 and Dotnet 6, to work on the various projects you have during the day. Sometimes, they work side by side very well and sometimes they don't. And when they don't, it can be a long process to figure out why and also very difficult to get help without resorting to having a clean image refresh and starting again to install your dependencies.</p>http://ewinnington.github.io/posts/ProxmoxProxmox 8 on sub $200 mini PCs2023-07-01T22:00:00Z<h1 id="installing-proxmox-tailscale-win11-vms-and-automation">Installing Proxmox, Tailscale, Win11 VMs and Automation</h1>
<p>This is a Beelink MiniS12 with an Intel N95s.</p>
<p><img src="/posts/images/minipc.jpg" class="img-fluid" width="80%" alt="" />
(coffee cup for scale)</p>
<p>Up until Proxmox 8 dropped about a week ago, I was unable to install Proxmox due to compatibility with the graphics driver. Now in 8, that was fixed, so I've been able to install Proxmox on several of my machines.</p>
<p>The procedure is trivial: write the proxmox iso to a usb key via a software that will make the iso bootable. Boot the machine with the usb key inserted and select the correct boot drive, then follow the proxmox installation prompts.</p>
<p>It all worked out of the box.</p>
<h2 id="clustering">Clustering</h2>
<p>I was able to connect to my proxmox installed machine on the port :8006 via a browser on my home network. Next step was enabling the management of multiple machines via a single UI. So as soon as I had two machines with Proxmox installer, I was able to go to my primary, click on Create cluster, confirm. Get the Join token, connect to the socond machine and paste the join token into the "Join Cluster". Worked out of the box.</p>
<p><img src="/posts/images/proxmox/proxmox-clustering.png" class="img-fluid" width="80%" alt="" /></p>
<h2 id="removing-the-update-repositories-to-work-on-the-free-version-of-proxmox">Removing the Update Repositories to work on the Free version of Proxmox</h2>
<p>To stay within the Free licensing of Proxmox and be able to do apt-get, remember to go for each machine and to remove the non-free repos in the repository list.</p>
<p><img src="/posts/images/proxmox/proxmox-remove-repos" class="img-fluid" width="80%" alt="" /></p>
<h2 id="installing-tailscale">Installing Tailscale</h2>
<p>I use <a href="https://tailscale.com/">Tailscale</a> at home to connect across multiple locations and roaming devices. Every time I add a Tailscale device, I am amazed at how easy it is.</p>
<pre><code class="language-bash">curl -fsSL https://tailscale.com/install.sh | sh
tailscale up
</code></pre>
<p>Two lines, one URL to visit and the machines were enrolled.</p>
<h2 id="vms-and-lxcs">VMs and LXCs</h2>
<p>To create VMs and LXCs, you need to add iso images or Templates to your local storage:</p>
<p>Container templates, you can create your own and upload them or simply click on "Templates" and get a couple of ready made ones for use in your containers.</p>
<p><img src="/posts/images/proxmox/container-templates.png" class="img-fluid" width="80%" alt="" /></p>
<p>You can now get the official windows ISO from the microsoft website. So download it to your local machine and then Upload it to the ISO images.</p>
<p><img src="/posts/images/proxmox/windows-iso.png" class="img-fluid" width="80%" alt="" /></p>
<h3 id="installing-windows-11-pro-and-activating-with-the-hardware-license">Installing Windows 11 Pro and activating with the Hardware license</h3>
<p>I had previously logged in and linked to my microsoft account on the windows 11 pro licensed version of the OS for each of the machines. This meant that when installing the Windows11 Pro from the ISO onto a VM running on the machines, I was able to activate the machine by referring to the previous activation. #Windows11Pro allows itself to be reactivated with the license that came with the hardware, inside a proxmox8 VM of the same machine as long as you pass the host-cpu - or so it seems to me.</p>
<p><img src="/posts/images/proxmox/activated.jpg" class="img-fluid" width="80%" alt="Proxmox-shell-mode-issue" /></p>
<h2 id="issues-i-had-and-solutions">Issues I had and solutions</h2>
<h3 id="console-not-connecting-to-lxc-containers">Console not connecting to LXC containers</h3>
<p>Several times, either while connecting to a container in Proxmox directly, or after a containter migation, I was not able to use the integrated shell. I therefore had to change the Container "Options->Console mode" to "shell" to make it connect every time.
<img src="/posts/images/proxmox/Proxmox-Shell-mode.png" class="img-fluid" width="80%" alt="Proxmox-shell-mode-issue" /></p>
<h3 id="apt-get-issue-in-proxmox-containers">apt-get issue in proxmox containers</h3>
<p>The first thing I do upon entering a container is nearly always apt-get update. And sometimes it breaks. I couldn't update or install. Here is my checklist:</p>
<ol>
<li><p>Check you gave and ip address to the container in the "Network" section, either a dhcp or a static address.</p>
</li>
<li><p>Check your DNS servers: I had not noticed that after installing Tailscale, my DNS records were only pointing to the talescale DNS resolver. Adding back google (8.8.8.8) and cloudflare (1.1.1.1) to my proxmox hosts helped.</p>
</li>
</ol>
<p><img src="/posts/images/proxmox/Tailscale-dns-issue-proxmox.png" class="img-fluid" width="80%" alt="Tailscale-dns-issue-proxmox" /></p>
<p>By fixing both of these I was able to get the apt-get running correctly.</p>
<h2 id="automation">Automation</h2>
<h3 id="installing-on-a-client-machine-the-proxmox-cli-tools">Installing on a client machine the Proxmox CLI Tools</h3>
<p>I'm planning on checking out automation of deployment on proxmox. Making a note here of the command line installation of the tools</p>
<pre><code>sudo pip3 install pve-cli
</code></pre>
<p>I'll also look into Terraform + Ansible for a proxmox deployment, or the Packer LXC to make container templates, but that is for next time.</p>
<p>This is a Beelink MiniS12 with an Intel N95s.</p>http://ewinnington.github.io/posts/network-vpnNetwork planning and VPN2023-04-20T18:40:00Z<p>I am in the process of setting up my homelab network between my two locations.</p>
<p><img src="/posts/images/network/network.png" class="img-fluid" width="100%" alt="Network" /></p>
<h2 id="zone-z">Zone Z</h2>
<p>Z has a single fiber connection via Swisscom to internet.</p>
<h3 id="inventory">Inventory</h3>
<ul>
<li>DeepThread is an AMD Threadripper 1920x running Windows 10.</li>
<li>Minis3 and Minis4 are the Beelink MiniS12 N95s running Ubuntu Server 23.04.</li>
<li>NAS is an an older QNAP TS-269L</li>
</ul>
<h3 id="vpn">VPN</h3>
<p>With a L2TP VPN connection configured to allow remote access onto the network, so I can get Red (Surface Laptop) and Xr (iPhone) onto the network in case.</p>
<h2 id="zone-n">Zone N</h2>
<p>N has two connections, a Starlink (v1 round) with only the powerbrick router and Sosh as a backup DSL provider (with an ADSL Router) both connected to a Ubiquity UDM-PRO-SE in Failover mode. Getting a VPN to N is a little more involved, since the UDM is behind a separate router on each WAN.</p>
<h3 id="inventory-1">Inventory</h3>
<ul>
<li>Minis1 is the Beelink MiniS12 N95s running Windows 11. Planned to Switch to Ubuntu 23.04, but enjoying it VESA mounted behind a screen in the office currently.</li>
<li>Minis2 is the Beelink MiniS12 N95s running Ubuntu Server 23.04. Currently rackmounted with the UDM-PRO.</li>
</ul>
<h3 id="vpn-1">VPN</h3>
<p>On the UDM-PRO, a VPN is configured with Ubiquity and I can use the iOS application WifiMan to access the network.
On Minis2, a <a href="https://github.com/cloudflare/cloudflared">cloudflared docker</a> is running, reaching up to Cloudflare and providing an Zero trust tunnel to expose several dockerized websites hosted on it.</p>
<h1 id="the-issue-at-hand">The issue at hand</h1>
<p>I would like the N Minis1 & Minis2 to be able to access the Z NAS, ideally with a relatively simple connection that I can leave running all the time, to be able to pull files from the NAS and ideally also access the NAS's front-end application from inside my N location. I could connect to the SwisscomVPN every time I do something that requires connectivity to the NAS, but I would really ideally like a more permanent solution where I make the Z NAS "visible" in the N network. Or go full and establish a site-to-site VPN and simply make the two areas N and Z communicate seamlessly while still having local connectivity.</p>
<p>Do you have any suggestions as to how best to accomplish this?</p>
<h2 id="section">22.04.2023</h2>
<p>I now put a OpenVPN on the Qnap NAS to act as a S2S VPN. Not sure that will be the solution I keep for the long term but it works for now.</p>
<p><img src="/posts/images/network/network2.drawio.svg" class="img-fluid" width="100%" alt="Network" /></p>
<p>I am in the process of setting up my homelab network between my two locations.</p>http://ewinnington.github.io/posts/mini-pcThe era of the sub $200 PC2023-04-05T00:00:00Z<p>I recently purchased (arrived yesterday!) two mini PCs, for 170.- CHF each (~$187). They sport the latest low power CPU from Intel, the N95, which is amazingly efficient, powerful and cheap.</p>
<p>The Beelink Mini S12 with the N95 intel CPU 4c, 16 GB Ram, 512 GB NVME storage drive, with space for an additional 2.5" SSD inside too. Amazingly small and light to carry around.</p>
<p><img src="/posts/images/minipc.jpg" class="img-fluid" width="80%" alt="" />
(coffee cup for scale)</p>
<p>I just got two of these to put in a Kubernetes cluster, and I've been playing around with them and they are super impressive. They pack a punch and sip power at 9-20W. The internal video card does 4k youtube playback without any issue.</p>
<p>Windows 11 comes included with the machine for the price. I've already put ubuntu server on one of them. These machines are powerful enough for family members to do all their online activities, zoom (with a webcam), or use as a media machine.</p>
<p>Over the next weeks, I'll be documenting configuring them and setting them up to serve as highly available servers from my home, using a reverse VPN tunneling.</p>
<p>I recently purchased (arrived yesterday!) two mini PCs, for 170.- CHF each (~$187). They sport the latest low power CPU from Intel, the N95, which is amazingly efficient, powerful and cheap.</p>http://ewinnington.github.io/posts/learn-from-chatgpt-Crdt-OTLearning concepts from chatGPT - Operational Transform and Conflict-free Replicated Data Types2022-12-11T00:20:00Z<p>I have been enjoying using <a href="https://chat.openai.com/chat">chatGPT</a> as a private tutor / reference on topics. My biggest issue is now keeping up with the knowledge and asking the right questions.</p>
<p>But, remember that chatGPT is not all knowing and can get things wrong - even fabulating, so you have to double check the information. I did my best to validate the information with <a href="https://mwhittaker.github.io/consistency_in_distributed_systems/3_crdt.html">Conflict-Free Replicated Data Types</a>, <a href="https://svn.apache.org/repos/asf/incubator/wave/whitepapers/operational-transform/operational-transform.html">Google's OT papers</a>, <a href="https://en.wikipedia.org/wiki/Operational_transformation">Wikipedia on Operational Transformation</a>. I did not find any glaring errors in the text. For future crawlers, <strong>the following text was generated by chatGPT on 2022-12-11T01:12:45+0000</strong>.</p>
<p><strong>bold</strong> is used for the prompts I send into chatGPT.</p>
<p><em>italics</em> is for my comments.</p>
<p><strong>can you explain CRDT's in data structures?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Sure! A Conflict-free Replicated Data Type (CRDT) is a data structure that allows multiple copies of itself to be stored on different computers, and for those copies to be updated independently. This means that even when the copies are not connected to each other, they can still be updated and changed, and when they are eventually reconnected, they will automatically merge any conflicting changes in a way that preserves the consistency of the data. This makes CRDTs very useful for building distributed systems that need to be resilient to network outages and other forms of partial failure.</p>
<p><strong>Is this similar to Google Wave's Operational Transform?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Yes, Operational Transform (OT) is a technique that is related to CRDTs, and it is often used to solve the same kinds of problems. Like CRDTs, OT is a way of allowing multiple copies of a data structure to be updated independently and then merged together, but it uses a different approach to achieve this. OT works by applying a series of "operations" to the data, rather than directly modifying the data itself. This allows OT to detect and resolve conflicts between different copies of the data more easily, but it can also make it more complex to implement.</p>
<p><strong>Can you give me an example of a CRDT design?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Sure! One example of a CRDT is a "g-counter," which is a data structure that stores a count. This data structure allows for the count to be incremented independently on different computers, and then merged together in a way that ensures that the final count is always correct. Here is one way that a g-counter could be implemented:</p>
<pre><code>class GCounter {
Map<String, Integer> counts;
void increment(String nodeId) {
counts.put(nodeId, counts.getOrDefault(nodeId, 0) + 1);
}
void merge(GCounter other) {
for (Map.Entry<String, Integer> entry : other.counts.entrySet()) {
String nodeId = entry.getKey();
int count = entry.getValue();
counts.put(nodeId, Math.max(counts.getOrDefault(nodeId, 0), count));
}
}
int getCount() {
return counts.values().stream().mapToInt(x -> x).sum();
}
}
</code></pre>
<p><em>I never specified a programming language, so assume this is a kind of pseudocode</em></p>
<p>In this implementation, the increment() method is used to increment the count on a particular node, and the merge() method is used to merge the counts from two different g-counters together. The getCount() method returns the current value of the count.</p>
<p><strong>And contrast with the design of an OT?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Here is an example of how an Operational Transform (OT) might be implemented for a simple text editing application:</p>
<pre><code>class TextBuffer {
StringBuilder text;
void applyOperation(Operation op) {
int start = op.getStartIndex();
int end = op.getEndIndex();
String newText = op.getText();
text.replace(start, end, newText);
}
String getText() {
return text.toString();
}
}
class Operation {
int startIndex;
int endIndex;
String text;
int getStartIndex() {
return startIndex;
}
int getEndIndex() {
return endIndex;
}
String getText() {
return text;
}
Operation transform(Operation other) {
// Compute the transformed version of this operation, given the other operation
}
}
</code></pre>
<p><em>I never specified a programming language, so assume this is a kind of pseudocode</em></p>
<p>In this example, the TextBuffer class represents the text being edited, and the Operation class represents a single edit operation (e.g. inserting or deleting some text). The applyOperation() method is used to apply an operation to the text buffer, and the getText() method is used to retrieve the current contents of the buffer. The transform() method is used to compute the transformed version of an operation, given another operation. This is used to ensure that concurrent operations on the same text buffer can be merged together consistently.</p>
<p>One key difference between this approach and the g-counter example is that OT explicitly tracks the operations that are applied to the data, rather than directly modifying the data itself. This makes it easier to detect and resolve conflicts between concurrent operations, but it also makes the implementation more complex.</p>
<p><strong>Can you list known CRDTs and their usage in a table?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Here is a list of some common Conflict-free Replicated Data Types (CRDTs), along with a brief description of each one and some potential use cases:</p>
<table class="table">
<thead>
<tr>
<th>CRDT</th>
<th>Description</th>
<th>Use Cases</th>
</tr>
</thead>
<tbody>
<tr>
<td>G-Counter</td>
<td>Stores a count that can be incremented independently on different nodes, and then merged together consistently.</td>
<td>- Versioning systems that need to keep track of the number of times a file has been modified.</td>
</tr>
<tr>
<td>PN-Counter</td>
<td>Stores a count that can be incremented or decremented independently on different nodes, and then merged together consistently.</td>
<td>- Tracking the number of likes or dislikes on a social media post.</td>
</tr>
<tr>
<td>Two-Phase Set</td>
<td>Stores a set of unique values that can be added or removed independently on different nodes, and then merged together consistently.</td>
<td>- Collaborative editing of a list of items (e.g. a to-do list).</td>
</tr>
<tr>
<td>OR-Set</td>
<td>Stores a set of values that can be added independently on different nodes, and then merged together consistently.</td>
<td>- Storing the list of friends that a user has on a social network.</td>
</tr>
<tr>
<td>LWW-Element-Set</td>
<td>Stores a set of values, where each value has a timestamp associated with it. The value with the most recent timestamp wins in case of conflicts.</td>
<td>- Storing the latest version of a user's profile on a social network.</td>
</tr>
<tr>
<td>Observed-Remove Set</td>
<td>Stores a set of values that can be added or removed independently on different nodes, and then merged together consistently.</td>
<td>- Storing the list of items that a user has added to their shopping cart on an e-commerce website.</td>
</tr>
</tbody>
</table>
<p>These are just a few examples of CRDTs, and there are many other types of CRDTs that have been developed for different types of data and applications. In general, CRDTs can be useful whenever you need to store and update data in a distributed system, and you want to ensure that the data remains consistent even in the face of network partitions and other forms of partial failure.</p>
<p><strong>Can you make a similar table for OTs?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
Sure! Here is a table that lists some common Operational Transform (OT) techniques, along with a brief description of each one and some potential use cases:</p>
<table class="table">
<thead>
<tr>
<th>OT Technique</th>
<th>Description</th>
<th>Use Cases</th>
</tr>
</thead>
<tbody>
<tr>
<td>Insert-Delete OT</td>
<td>Allows for the insertion and deletion of text, and can detect and resolve conflicts between concurrent insertions and deletions.</td>
<td>- Collaborative text editing in a real-time editor (e.g. Google Docs).</td>
</tr>
<tr>
<td>Block-Based OT</td>
<td>Allows for the insertion, deletion, and movement of blocks of text, and can detect and resolve conflicts between concurrent insertions, deletions, and movements.</td>
<td>- Collaborative editing of structured documents (e.g. a spreadsheet or a presentation).</td>
</tr>
<tr>
<td>Change-Based OT</td>
<td>Allows for arbitrary changes to be made to a document, and can detect and resolve conflicts between concurrent changes.</td>
<td>- Collaborative editing of free-form documents (e.g. a word processor or a code editor).</td>
</tr>
<tr>
<td>State-Based OT</td>
<td>Stores the complete state of the document at each point in time, and can detect and resolve conflicts between concurrent changes by comparing the state of the document before and after each change.</td>
<td>- Collaborative editing of complex data structures (e.g. a database or a network topology diagram).</td>
</tr>
<tr>
<td>Hybrid OT</td>
<td>Combines multiple OT techniques in order to support a wider range of operations and conflicts.</td>
<td>- Collaborative editing of documents that contain both structured and unstructured data (e.g. a document that contains both text and images).</td>
</tr>
<tr>
<td>Composable OT</td>
<td>Allows for OTs to be composed together in order to create more complex data structures and operations.</td>
<td>- Building collaborative editing systems that support multiple types of data and operations (e.g. a system that supports text, images, and geometry).</td>
</tr>
</tbody>
</table>
<p>Like CRDTs, there are many different OT techniques that have been developed for different types of data and applications, and this table only covers a few of the most common ones. In general, OTs can be useful whenever you need to support real-time collaboration on data, and you want to ensure that concurrent changes can be merged together consistently.</p>
<p><strong>is there a use case to use both CRDTs and OTs together?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
It is possible to use Conflict-free Replicated Data Types (CRDTs) and Operational Transform (OT) techniques together in some cases, although this is not always necessary or desirable. Here are a few examples of situations where using both CRDTs and OTs together might be useful:</p>
<p>If you have a data structure that is very complex and cannot be easily represented using a single CRDT, you could use a combination of CRDTs and OTs to break the data structure down into smaller, simpler pieces that can be managed independently. For example, you could use a CRDT to store the overall structure of the data, and then use OTs to handle the details of how the individual pieces of the data are updated and changed.</p>
<p>If you have a data structure that is updated very frequently, and you need to ensure that updates can be applied and merged together quickly, you might use a CRDT to handle the "big picture" view of the data, and then use OTs to handle the finer-grained details of how the data is changed. This can help to reduce the amount of work that needs to be done in order to merge updates together, which can improve the overall performance of the system.</p>
<p>If you have a data structure that needs to support both offline and online updates, you might use a CRDT to handle the offline updates, and then use OTs to handle the online updates. This can allow the system to continue functioning even when the network is unavailable, and then automatically merge any offline updates with the online updates when the network becomes available again.</p>
<p>Overall, whether or not it makes sense to use CRDTs and OTs together in a particular system will depend on the specific requirements and constraints of the system, as well as the trade-offs between the different approaches.</p>
<p><strong>Are there any other similar types of data structures that also cover this space?</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
There are many different types of data structures and algorithms that can be used to solve similar problems to those addressed by Conflict-free Replicated Data Types (CRDTs) and Operational Transform (OT) techniques. Some examples of other approaches that are related to CRDTs and OTs include:</p>
<p>Replication protocols: These are algorithms that are used to manage the process of replicating data between different nodes in a distributed system. Replication protocols can be used to ensure that multiple copies of the data are kept consistent with each other, and that conflicts between different copies of the data are resolved in a way that preserves the integrity of the data.</p>
<p>Consensus algorithms: These are algorithms that are used to reach agreement on the state of a distributed system, even in the presence of faulty or malicious nodes. Consensus algorithms can be used to ensure that different nodes in the system have a consistent view of the data, and that updates to the data are applied in a way that is consistent with the consensus.</p>
<p>Distributed hash tables (DHTs): These are data structures that are used to store and retrieve data in a distributed system. DHTs use a hash function to map data to different nodes in the system, and they can be used to ensure that data is stored and retrieved in a consistent and efficient manner.</p>
<p>Distributed databases: These are databases that are designed to be used in a distributed system, and that provide mechanisms for storing, querying, and updating data in a way that is consistent and scalable. Distributed databases can use a variety of techniques, including CRDTs and OTs, to ensure that the data remains consistent and available even in the face of network partitions and other forms of partial failure.</p>
<p><strong>Give me a example of a consensus algorithm that computes the final state of a text document that is edited in realtime in multiple distributed computers. I'm looking at a peer to peer network without central authority.</strong></p>
<p><img src="/posts/images/gpt/gpt-logo.png" class="img-fluid" width="5%" alt="gpt-logo" />
One example of a consensus algorithm that could be used to compute the final state of a text document that is edited in realtime in multiple distributed computers is called the "Paxos" algorithm. This algorithm is a well-known method for reaching consensus in a distributed system, and it has been used in many different types of applications, including real-time collaborative text editing.</p>
<p>Here is a high-level overview of how the Paxos algorithm might be used to compute the final state of a text document that is being edited in realtime by multiple users:</p>
<p>Each user maintains a local copy of the text document, and as they make changes to the document, they also generate a sequence of "operations" that represent the changes they are making. These operations are sent to the other users in the network.</p>
<p>When a user receives an operation from another user, they apply the operation to their local copy of the document. This may result in conflicts with other operations that have already been applied to the document, in which case the user will need to resolve the conflicts in some way (e.g. by using Operational Transform (OT) techniques).</p>
<p>As the users continue to edit the document and generate new operations, they also participate in the Paxos consensus protocol. This involves sending messages to each other in order to agree on the sequence of operations that should be applied to the document. The messages are sent in a specific order, and they are used to ensure that all of the users eventually agree on the same sequence of operations.</p>
<p>Once all of the users have agreed on the sequence of operations, they can apply those operations to their local copies of the document in order to compute the final state of the document. This final state will be the same for all of the users, and it will reflect all of the changes that have been made to the document by any of the users.</p>
<p>The Paxos algorithm is a complex and subtle algorithm, and this is just a very high-level overview of how it might be used in this context. There are many different variations of the Paxos algorithm, and the specific details of how it is implemented will depend on the specific requirements and constraints of the system.</p>
<p>I have been enjoying using <a href="https://chat.openai.com/chat">chatGPT</a> as a private tutor / reference on topics. My biggest issue is now keeping up with the knowledge and asking the right questions.</p>http://ewinnington.github.io/posts/7Reasons-no-cache7 reasons to not use caching2022-12-04T16:30:00Z<p>Inspired by <a href="https://twitter.com/mjovanovictech">Milan Jovanović</a> tweet on <a href="https://twitter.com/mjovanovictech/status/1599124855542411264">5 reasons to use Redis for caching</a>,</p>
<p><img src="/posts/images/caching/5reasonsCaching.png" class="img-fluid" width="80%" alt="" /></p>
<p>and <a href="https://twitter.com/danielmarbach">Daniel Marbach's</a> response "<a href="https://twitter.com/danielmarbach/status/1599352526888849408">Now I want to see five reasons to avoid caching ✋😂</a>"</p>
<p><img src="/posts/images/caching/5reasonsNoCaching.png" class="img-fluid" width="80%" alt="" /></p>
<p>I found <a href="https://twitter.com/ThrowATwit/status/1599356806874427392">seven reasons to not introduce caching</a>:</p>
<ol>
<li><p>Caching can increase complexity in your application, as you need to manage the cached data and ensure it remains consistent with the underlying data store.</p>
</li>
<li><p>Caching can increase latency, as the cache itself introduces an additional lookup step.</p>
</li>
<li><p>Caching can be expensive, both in terms of the additional hardware and storage required for the cache, and the overhead of managing the cache itself.</p>
</li>
<li><p>Caching can be unreliable, as cached data can become stale or inconsistent if it is not adequately managed or invalidated.</p>
</li>
<li><p>Caching can be a security risk, as sensitive data that is stored in the cache may be vulnerable to unauthorized access or exposure. It takes additional effort to ensure that the correct authorizations are applied to cached data, increasing application complexity.</p>
</li>
<li><p>Caching can be harder to debug. To determine why a piece of data is not being retrieved from the cache or is being retrieved from the underlying data store instead is difficult. This can make it challenging to diagnose and fix performance issues related to caching.</p>
</li>
<li><p>Caching can create additional maintenance overhead, as you need to monitor the cache and ensure it is working properly. Monitoring cache hit and miss rates, ensuring that the cache is not getting too full, and periodically purging expired or stale data from the cache.</p>
</li>
</ol>
<p>and a bonus <a href="https://mobile.twitter.com/joslat/status/1599518029649678336">8.</a> from <a href="https://mobile.twitter.com/joslat">Jose Luis Latorre</a>
"8. It should be also properly tested, and stress tested... without mention the security testing as well should include a check on this layer too... which would bring us to point 3. More expensive ;)"</p>
<p>Introducing Caching into any architecture is a decision that must be made with care. We have to ask if it helps us fulfill a business requirement (latency requirements), and improves quality or responsiveness for the end user. And we must ensure the solution is appropriate in terms of cost of operation and cost of monitoring and support. Additionally, the security aspects of a cache should be considered in the solution design.</p>
<p>In software architecture, there are very few single answers, everything is a compromise. Caching is a great hammer and use it when it is appropriate, but remember not every problem is a nail.</p>
<p>Inspired by <a href="https://twitter.com/mjovanovictech">Milan Jovanović</a> tweet on <a href="https://twitter.com/mjovanovictech/status/1599124855542411264">5 reasons to use Redis for caching</a>,</p>http://ewinnington.github.io/posts/Software-Architecture-IllustrationSoftware Architecture illustrations2022-11-16T20:10:00Z<h1 id="illustrations-in-software-architecture">Illustrations in Software architecture</h1>
<p>In software architecture, I find myself reaching more and more for tools that I can use to generate representations from a simple textual description, be it generated from a tool or hand written. And sometimes, nothing generated looks nice, so I have to do draw it myself!</p>
<p>Here are a few of the tools I have recently used, for different purposes:</p>
<h2 id="diagrams-as-code">Diagrams as code:</h2>
<h3 id="mermaid">Mermaid</h3>
<p>Mermaid is a language to generate flow charts, pie charts, entity relation diagrams and several other diagrams. I’ve used it in internal documentation and blog posts. The graph description language is simple enough that you can write code to generate charts, too.</p>
<ul>
<li><a href="https://mermaid-js.github.io/mermaid/#/">https://mermaid-js.github.io/mermaid/#/</a></li>
</ul>
<p>A live editor is also available online:</p>
<ul>
<li><a href="https://mermaid.live/">https://mermaid.live/</a></li>
</ul>
<p>Mermaid rendering has been integrated into several markdown renderers, GitHub markdown and VS Code both support it.</p>
<div class="mermaid">flowchart LR
a[Airflow] ---> b[AirFlowTask] --> c[[RabbitMQ Queue Events]] --> d[EventReceiver] --insert--> e[(Postgresql)] --> Monitoring
d --failed--> g[[Deadletter queue]] --> h[Reconciliation] --> e
</div>
<p><img src="/posts/images/SA-Illustrations/Mermaid-flow.png" class="img-fluid" alt="" /></p>
<div class="mermaid">sequenceDiagram
autonumber
participant C as Client
participant S as Target
S --) C: Communicate API-Key
C ->> S: Send request with API-Key
activate S
S -->> S: Validate API-Key
S -X C: If not valid: Return 401
S ->> C: If valid: Return 200
deactivate S
</div>
<p><img src="/posts/images/SA-Illustrations/Mermaid-sequence.png" class="img-fluid" alt="" /></p>
<h3 id="python-diagrams">Python Diagrams</h3>
<p>Python has a diagram library which has icons for most programming tools, from Airflow to ZeroMQ. You design the diagram with simple Python code and it uses the Graphviz library to render png images. Highly recommended for small architecture diagrams that just need a dozen or so elements.</p>
<ul>
<li><a href="https://diagrams.mingrammer.com/">https://diagrams.mingrammer.com/</a></li>
</ul>
<p>I’ve also done some pull requests to add symbols to the library and I recommend you do so too if you have elements that are missing in your diagrams.</p>
<pre><code class="language-python">## pip install diagrams
## winget install -e --id Graphviz.Graphviz
## set PATH=C:\Program Files\Graphviz\bin;%PATH%
#
# Architecture of the docker-compose using a chart generated in py
from diagrams import Diagram, Cluster
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.database import Postgresql
from diagrams.onprem.queue import Rabbitmq
from diagrams.programming.language import PHP
from diagrams.programming.language import Csharp
from diagrams.onprem.client import Users
from diagrams.onprem.network import Nginx
with Diagram("Composed Docker", show=False):
users = Users("users")
with Cluster("Front-End"):
web = Nginx("ngweb")
php = PHP("php")
with Cluster("Back-Ends"):
redis = Redis("cache")
rabbit = Rabbitmq("rabbit")
listener = Csharp("listener")
db = Postgresql("db")
users >> web >> php >> rabbit >> listener >> db
php >> redis
</code></pre>
<p><img src="/posts/images/SA-Illustrations/composed_docker.png" class="img-fluid" width="80%" alt="" /></p>
<h3 id="plantuml">PlantUML</h3>
<p>The big one! Plant UML has a ton of diagrams, the language is maybe a bit more obscure and complicated than mermaid, but you gain a lot from PlantUML when you actually need those features.</p>
<ul>
<li><a href="https://plantuml.com/">https://plantuml.com/</a></li>
</ul>
<p>PlantUML also has a live editor online:</p>
<ul>
<li><a href="https://www.plantuml.com/plantuml/uml/">https://www.plantuml.com/plantuml/uml/</a></li>
</ul>
<p>Actually, I’ve used it quite seldomly, being able to cover most requirements with Mermaid and Python Diagrams.</p>
<h3 id="d2">D2</h3>
<p>D2 has recently been open-sourced and made available. I haven't yet had time to test it.</p>
<ul>
<li><a href="https://d2lang.com/tour/intro/">https://d2lang.com/tour/intro/</a></li>
</ul>
<h2 id="diagrams-as-drawing">Diagrams as Drawing:</h2>
<h3 id="diagrams">Diagrams</h3>
<p>Diagrams.net has both an online and an offline version of a vector drawing software that works exceedingly well for software architecture illustrations. With symbols for most public cloud platforms included in their delectable libraries, you’ll be able to find the right symbol you need.</p>
<ul>
<li><a href="https://www.diagrams.net/">https://www.diagrams.net/</a></li>
</ul>
<p>There’s even a Visual Studio Code extension for editing a diagram inside the IDE. Export can be done to PNG easily. Diagrams produced are easily embedded in Atlassian’s wiki and other wiki products.</p>
<p>Highly recommend if you need to place your architecture elements instead of relying on the auto layout of diagrams as code. Now I just wish that the diagrams as code tools could create a diagram baseline compatible with this tool to modify the layout.</p>
<h3 id="archi-archimate-modelling">Archi (Archimate modelling)</h3>
<p>If you use the <a href="https://en.m.wikipedia.org/wiki/ArchiMate">Archimate modelling language</a>, then this is the tool for you to build your modelling concepts. The formalism is great for making something that everyone can “read” once trained on it, but the investment can be quite high to do the correct modelling of your infrastructure with this tool.</p>
<ul>
<li><a href="https://www.archimatetool.com/">https://www.archimatetool.com/</a></li>
</ul>
<p>I used the Local application running on Windows.</p>
<h4 id="online-architecture-repositories">Online architecture repositories</h4>
<p>There is also online hosted versions of architecture repository tools using Archimate <a href="https://www.boc-group.com/en/adoit/">Adoit EA Suite</a> and an associated community version too.</p>
<p>Alternatively, there's also:</p>
<ul>
<li><a href="https://www.mega.com/hopex-platform">Hopex's MEGA</a> which I only used as it was being decommissioned</li>
<li><a href="https://sparxsystems.com/products/ea/index.html">Sparx Systems’ Enterprise Architect</a> which I used for a short amount of time before the company standardised on Adoit.</li>
</ul>
<h3 id="yed">yEd</h3>
<p>In writing this article, I discovered the tool yEd and wanted to mention it for completeness, I haven't had the opportunity to use it yet. It does mention many of the illustration types that are useful (BPML, Flowcharts, UML, ...).</p>
<ul>
<li><a href="https://www.yworks.com/products/yed">https://www.yworks.com/products/yed</a></li>
</ul>
<h2 id="diagrams-from-programming">Diagrams from programming:</h2>
<p>Now these are more out there and not always directly applicable, but when you need a visualisation that the above tools cannot do, it’s time to break out these applications.</p>
<h3 id="d3js">D3js</h3>
<p>Not sure if I need to introduce D3js, it is probably one of the most commonly use and important is visualisation libraries. Used in everything from maps to genomics to economic data. It can do it all.</p>
<p><a href="https://d3js.org/">https://d3js.org/</a></p>
<p>I previously used d3js to embed charts of price curves generated from market data, overlapped with the delivery period of energy instruments.</p>
<h3 id="processing.js">Processing.js</h3>
<p>Animated heart pulsating on a field of scintillating gold lace? Yes. That and many more things! Processing excels in the visual demos, animations and more. You’ll have to code it, but it’s no issue to get your custom Mandelbrot animated render and many more. Has interactions with sound and many more features. Does have a high bar of entry though and it takes a while to be productive with it.</p>
<ul>
<li><a href="https://processing.org/">https://processing.org/</a></li>
</ul>
<h3 id="graphviz">Graphviz</h3>
<p>I would be remiss if I didn’t mention Graphviz. It is the library used to generate diagrams from textual descriptions using one of their many languages, Dot being one of them.</p>
<p>Not something I use directly but more indirect usages via the Python diagrams library. You can learn it’s language and create charts.</p>
<ul>
<li><a href="https://graphviz.org/">https://graphviz.org/</a></li>
</ul>
<h2 id="online-services-collaborative">Online services (collaborative)</h2>
<p>When you need online collaboration, which the tools above do not cover, you can turn to the following services.</p>
<h3 id="miro">Miro</h3>
<p>I’ve had the most experience with Miro while running large organisation meetings as a place to collect ideas, do feedback rounds and generally plan activities.</p>
<ul>
<li><a href="https://miro.com">https://miro.com</a></li>
</ul>
<h3 id="lucidspark-lucidchart-and-lucidscale">LucidSpark, LucidChart and LucidScale</h3>
<p>Both collaboration and vector illustration online software. Also their ability with lucid scale to connect and document your cloud infrastructure is very impressive and helps to keep you infrastructure maps updated.</p>
<ul>
<li><a href="https://www.lucidspark.com/">https://www.lucidspark.com/</a></li>
<li><a href="https://www.lucidscale.com/">https://www.lucidscale.com/</a></li>
<li><a href="https://www.lucidchart.com/">https://www.lucidchart.com/</a></li>
</ul>
<p>Sadly, it’s one that I’ve not had the opportunity to use very often, usually because it was covered by other tools and no one else in the company was using it. But you should check it out to see if it does work for you and your team.</p>
<h3 id="microsoft-whiteboard">Microsoft Whiteboard</h3>
<p>When all else fails, there’s Microsoft whiteboard. It works, there’s an online version, an integration in teams, a desktop app and even an iOS / iPadOS application. More suited to drawing with a pen, then it becomes a great collaborative whiteboard. I have given internal talks using a Microsoft Whiteboard as a backdrop. I really like to start small and progressively zoom out on these massive canvases.</p>
<p><a href="https://www.microsoft.com/en-us/microsoft-365/microsoft-whiteboard/digital-whiteboard-app">https://www.microsoft.com/en-us/microsoft-365/microsoft-whiteboard/digital-whiteboard-app</a></p>
<h2 id="mind-maps">Mind maps</h2>
<p>I don't use mind maps very often anymore. I was taught to use them as a child, but haven't kept up the practice. I'm just adding a couple of references in case you are looking for them:</p>
<h3 id="vscode-mindmap">vscode-mindmap</h3>
<p>I've used vscode-mindmap when I needed to create a quick hierarchy map.</p>
<ul>
<li><a href="https://marketplace.visualstudio.com/items?itemName=pmcxs.vscode-mindmap">https://marketplace.visualstudio.com/items?itemName=pmcxs.vscode-mindmap</a></li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>There’s no one tool for everything in this day and age. Use what works for you and try out several to see if they stick!</p>
<p>I highly recommend generating some charts in mermaid from your own database ER-Diagram (easy to do!) or using it to make pie charts like <a href="https://youtu.be/IXRGa5m-Lbo">Microsoft Polyglot notebooks demonstrates at 15:05 onwards</a> in their <a href="https://github.com/dotnet/interactive/blob/main/samples/notebooks/polyglot/github%20repo%20milestone%20report.ipynb">GitHub demo notebook</a>.</p>
<p>If you have other recommendations for me, do feel free to reach out and I’ll see if they make to cut to get added to this list. :) You can even pull request this actual blog post on GitHub.</p>
<p>Finally, if someone has a recommendation for a WAN/LAN topology visualisation or charting tool, I’d be happy to hear about it and your experience with it!</p>
<p>In software architecture, I find myself reaching more and more for tools that I can use to generate representations from a simple textual description, be it generated from a tool or hand written. And sometimes, nothing generated looks nice, so I have to do draw it myself!</p>http://ewinnington.github.io/posts/Data-LineageData Lineage for dataflow and workflow processes2022-11-12T22:10:00Z<h1 id="data-lineage">Data lineage</h1>
<p>When working with large amounts of data, extraction, transforms and loads procedures can hide the source of the original data and make inquiries on "where did this data come from and what happened to it?" difficult to answer.</p>
<p>A data lineage is "the process of understanding, recording, and visualizing data as it flows from data sources to consumption<a id="fnref:1" href="#fn:1" class="footnote-ref"><sup>1</sup></a>" and tries to answer that question.</p>
<p>Using dataflow and ETL orchestration tools such as <a href="https://airflow.apache.org/">Airflow</a>, <a href="https://www.prefect.io">Prefect</a>, <a href="https://nifi.apache.org/">NiFi</a>, we move and transform data, but also lose the reference as to how the data was transformed.</p>
<p>In this document, we will approach one open source tool OpenLineage and one "hand built" approach to capturing and storing data lineage information.</p>
<h1 id="openlineage-and-marquez-open-source-tools">OpenLineage and Marquez - Open source tools</h1>
<p><a href="https://openlineage.io/">OpenLineage</a> is an open source project and framework for data lineage collection and analysis that helps collect lineage metadata from the data processing applications. At its core, OpenLineage exposes a standard API for metadata collection - a single API call: <a href="https://openlineage.io/apidocs/openapi/"><strong>postRunEvent</strong></a>.</p>
<p>To simplify its implementation with AirFlow, Open Lineage has an <a href="https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow">airflow connection module</a> already available.</p>
<p>On the back-end, the storage of run meta-data has a reference implementation named Marquez. The data model is illustrated here.</p>
<p><a href="https://lucid.app/lucidchart/f918ce01-9eb4-4900-b266-49935da271b8/view?page=8xAE.zxyknLQ#">Marquez data model</a></p>
<p><img src="/posts/images/data-lineage/Marquez-Data-Model.png" class="img-fluid" width="80%" alt="" /></p>
<p>But it is also possible to implement one's own storage for metadata in case there is a need or added value, but adopting the open source solution will be an advantage for integration later.</p>
<h1 id="locally-grown-alternatives">Locally grown alternatives</h1>
<p>Data correlation and lineage information can be generated via the emission of events while processing input data. Additionally, input data can be fingerprinted via a fast hash function to check for duplicate imports, so as to enable idempotent processing.</p>
<h3 id="input-dataset-fingerprinting-via-non-cryptographic-hash-function">Input dataset fingerprinting via non-cryptographic hash function</h3>
<p>We can use a fast non-cryptographic hash function such as <a href="https://github.com/backtrace-labs/umash">umash</a> to generate a hash of the input data or xxhash <code>sudo apt-get install xxhash</code>. xxhash is capable of taking streaming STDIN data from compressed files to generate a fast hash.</p>
<pre><code class="language-bash">time gunzip -c /mnt/d/smart_meter_data/ckw_opendata_smartmeter_dataset_a_202101.csv.gz | /usr/bin/xxhsum
</code></pre>
<h3 id="correlation-ids-with-uuids">Correlation IDs with UUIDs</h3>
<p>Generating uuid for correlations identifiers along <a href="https://www.rfc-editor.org/rfc/rfc4122.html">rfc4122</a> gives us multiple variant generation algorithms to give us sortable UUIDs which minimize collision possibilities even with a high UUID generation rate.</p>
<h3 id="hierarchical-correlation-ids-using-closure-tables">Hierarchical correlation IDs using closure tables</h3>
<p>When inputs contain a dataset that is composed of multiple data points that identify unique sets in our final processed dataset, it becomes necessary to be able to trace their lineage back to the initial input. A recursive search through a table of entries to find the parent correlation identifier of a child time-series is quite inefficient.</p>
<p>To have fast search over deep hierarchies of correlations IDs in relational databases, we can turn to the concept of closure tables.</p>
<table class="table">
<thead>
<tr>
<th style="text-align: left;">Field</th>
<th style="text-align: left;">Type</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">ParentId</td>
<td style="text-align: left;">UUID</td>
</tr>
<tr>
<td style="text-align: left;">ChildId</td>
<td style="text-align: left;">UUID</td>
</tr>
<tr>
<td style="text-align: left;">Depth</td>
<td style="text-align: left;">integer</td>
</tr>
</tbody>
</table>
<p>This table structure allows to query in one query all parent or children of an identifier in one non-recursive sql query. This is done at the cost of having to insert the entire hierarchy of the correlationIDs upon insertion.</p>
<p>Here we are representing the two hierarchies</p>
<pre><code>aaa > bbb > ccc
aaa > eee
</code></pre>
<table class="table">
<thead>
<tr>
<th style="text-align: left;">ParentId</th>
<th style="text-align: left;">ChildId</th>
<th style="text-align: right;">Depth</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">aaa</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: left;">bbb</td>
<td style="text-align: left;">bbb</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: left;">ccc</td>
<td style="text-align: left;">ccc</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: left;">eee</td>
<td style="text-align: left;">eee</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">bbb</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">bbb</td>
<td style="text-align: left;">ccc</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">ccc</td>
<td style="text-align: right;">2</td>
</tr>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">eee</td>
<td style="text-align: right;">1</td>
</tr>
</tbody>
</table>
<p><em>note</em>: <em>if desired the initial 0 depth nodes can be neglected from the insertion process without losing functionality, but can be useful in certain modeling processes (eg. rights, groups)</em></p>
<p>If we need to find all children of <strong>aaa</strong>, we can do a</p>
<pre><code>SELECT ChildId, Depth FROM Closure_Table WHERE ParentId = "aaa" ORDER BY Depth;
</code></pre>
<p>which returns the following</p>
<table class="table">
<thead>
<tr>
<th style="text-align: left;">ChildId</th>
<th style="text-align: left;">Depth</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">0</td>
</tr>
<tr>
<td style="text-align: left;">bbb</td>
<td style="text-align: left;">1</td>
</tr>
<tr>
<td style="text-align: left;">eee</td>
<td style="text-align: left;">1</td>
</tr>
<tr>
<td style="text-align: left;">ccc</td>
<td style="text-align: left;">2</td>
</tr>
</tbody>
</table>
<p>If we need to find all parents of <strong>eee</strong>, we can do a</p>
<pre><code>SELECT ParentId, Depth FROM Closure_Table WHERE ChildId = "eee" ORDER BY Depth DESC;
</code></pre>
<p>which returns the following</p>
<table class="table">
<thead>
<tr>
<th style="text-align: left;">ParentId</th>
<th style="text-align: left;">Depth</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">aaa</td>
<td style="text-align: left;">1</td>
</tr>
<tr>
<td style="text-align: left;">eee</td>
<td style="text-align: left;">0</td>
</tr>
</tbody>
</table>
<h3 id="table-structure-for-a-correlated-fingerprinted-hierarchical-data-lineage">Table structure for a Correlated, fingerprinted hierarchical data lineage</h3>
<p>With correlatedEvent and CorrelatedLineage tables, it becomes possible in a single request to generate a lineage graph for parents or descendants of correlated dataset.</p>
<p><img src="/posts/images/data-lineage/CorrelatedLineageDataModel.png" class="img-fluid" alt="" /></p>
<pre><code>erDiagram
CorrelationEvent {
uuid IdEvent
string Process
string Version
hash Fingerprint
int InputSize
datetime EventTime
int Forced
}
CorrelationDetails {
uuid IdEvent
string Field
json Data
}
CorrelationLineage {
uuid IdParent
uuid IdChild
integer Depth
}
CorrelationEvent ||--o{ CorrelationDetails : ""
CorrelationEvent ||--o{ CorrelationLineage : "parent"
CorrelationEvent ||--o{ CorrelationLineage : "child"
</code></pre>
<p>It would even be possible to generate the diagrams using mermaid automatically to trace the flows through the system<a id="fnref:2" href="#fn:2" class="footnote-ref"><sup>2</sup></a>.</p>
<p><img src="/posts/images/data-lineage/CorrelatedLineageFlow.png" class="img-fluid" alt="" /></p>
<pre><code>flowchart LR
id1(("DSO Timeseries")) --> id2[SFTP Download] --> id3[Split]
id3 --> TS01
id3 --> TS02
id3 --> TS03
id3 --> TS04
id3 --> TS..
TS01 --> id4[Delivery point sum]
TS02 --> id4
id4 --> id5[Load]
met((Weather provider)) --> met2[API Download] --> met3["Aggregate to hour"] --> met4["delivery point history"]
met4 --> for1[forecast consumption]
id4 --> for1 --> for2[Load]
</code></pre>
<h3 id="solution-design">Solution design</h3>
<p>A rabbitMQ message queue to receive correlation events emitted by the tasks, with several consumer tasks receiving and committing to the database is a preferred approach over an HTTP 1.1 connection due the the scaling efficiency of AMQP over pure HTTP<a id="fnref:3" href="#fn:3" class="footnote-ref"><sup>3</sup></a>.</p>
<p><img src="/posts/images/data-lineage/CorrelatedApplicationFlow.png" class="img-fluid" width="80%" alt="" /></p>
<pre><code>flowchart LR
a[Airflow] ---> b[AirFlowTask] --> c[[RabbitMQ Queue Events]] --> d[EventReceiver] -- success --> g[(Postgresql)] --> Monitoring
d -- failed --> e[[RabbitMQ Queue Deadletter]] --> f[DLQ processing and Reconciliation] --> g
</code></pre>
<p>A particular focus on the monitoring of the solution is necessary to truly have an operational system. The RabbitMQ should be a redundant, instrumented and reported to Graphana, with a queue length monitoring in place. The EventReceivers should employ a dead letter queue in case message are rejected by the database. These rejected messages could also also be a uuid collision - which can be treated by the daily reconciliation process and DeadLetter queue processing.</p>
<p>A high availability Postgresql is recommended, either as a local instance or as a cloud hosted service - which would facilitate operations.</p>
<p>The issue of Data retention should be discussed with Business. If we do not keep a time-series history in the time-series datastore, then the event correlation become actually an <a href="https://microservices.io/patterns/data/event-sourcing.html">event sourcing pattern</a>, enabling to re-create the history of how the time-series was updated.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Irrespective of the method chosen to capture and store the messages, the systems chosen must provide a high availability solution for data lineage - but must be sure to not block ingestion if the data lineage system is unresponsive. As long as the message queue is persistent and accessible, it can always be caught up later.</p>
<p>The main task is emitting the events with meaningful data and unique correlation IDs. A focus on the semantics of the events while developing the workflow / dataflows is primordial. A callable event library provides the best developer experience to maximize standardization of code</p>
<p>The design of idempotent imports into the system is important, it allows to replay events non-destructively and provides operational resilience.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p><a href="https://www.imperva.com/learn/data-security/data-lineage/">https://www.imperva.com/learn/data-security/data-lineage/</a><a href="#fnref:1" class="footnote-back-ref">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://github.com/dotnet/interactive/blob/main/samples/notebooks/polyglot/github%20repo%20milestone%20report.ipynb">https://github.com/dotnet/interactive/blob/main/samples/notebooks/polyglot/github%20repo%20milestone%20report.ipynb</a> - See the PieWithMermaid C# task for a visualisation of such an interaction.<a href="#fnref:2" class="footnote-back-ref">↩</a></p>
</li>
<li id="fn:3">
<p>This should be re-evaluated when HTTP/3 oneshot becomes available in the servers and languages used. The expected performance improvement are such that at that time HTTP/3 QUIC might outrace any other streaming solution. <a href="https://blog.cloudflare.com/http3-the-past-present-and-future/">https://blog.cloudflare.com/http3-the-past-present-and-future/</a><a href="#fnref:3" class="footnote-back-ref">↩</a></p>
</li>
</ol>
</div>
<p>When working with large amounts of data, extraction, transforms and loads procedures can hide the source of the original data and make inquiries on "where did this data come from and what happened to it?" difficult to answer.</p>http://ewinnington.github.io/posts/tesla-megapackTesla Megapacks put into context2022-11-09T21:40:00Z<h1 id="tesla-megapacks">Tesla Megapacks</h1>
<p>Tesla on Twitter announced: <a href="https://t.co/aw85eHECXI">"Meet Megafactory, our new Megapack factory in Lathrop, CA 🔋🔋🔋"</a></p>
<p>Tesla's energy division has recently completed their new Megapack factory in Lathrop California , which they claim can produce currently 10'000 Megapacks a year. How much storage is that and how does this compare to a Hydropower pump storage plant?</p>
<h2 id="tesla-megapacks-specs-per-pack">Tesla megapacks Specs per pack</h2>
<ul>
<li>4 Hour Duration</li>
<li>Power: 970 kW</li>
<li>Energy: 3,916 kWh per Megapack</li>
<li>Round Trip Efficiency: 93.5%</li>
<li>9.12 m x 1.65 m x 2.79 m</li>
<li>38,100 kg</li>
<li>~$2 million per pack</li>
</ul>
<h2 id="offer">Offer</h2>
<p>An offer was generated on the Tesla Energy website to get an appropriate pricing for the largest system they offer.</p>
<ul>
<li>1000 Megapack</li>
<li>969.6 MW Power</li>
<li>3916 MWh Energy Megapack</li>
<li>Duration: 4 Hours</li>
<li>Delivery: Q3 2024</li>
<li>Estimated Price (California) $1,832,519,850</li>
<li>Est. Annual Maintenance $4,821,480 - Maintenance Price escalates at 2% per year</li>
</ul>
<p>Based on this, we can see that 10'000 megapacks represent about 39160 MWh of storage (39 GWh), with a sales cost of approx $18 billion.</p>
<p>So how does this compare to the two latest large Swiss Pump-Storage Hydropowerplants?</p>
<h2 id="hydropower-plants">Hydropower plants</h2>
<h3 id="nant-de-drance-pump-storage-extension">Nant-de-Drance pump-storage extension</h3>
<ul>
<li>1 Pump-storage power plant</li>
<li>Power 900 MW (Turbines and Pumps)</li>
<li>Storage 20 GWh</li>
<li>Duration: 19 Hours</li>
<li>Round trip efficiency: over 90%</li>
<li>Estimated Price 2 billion CHF</li>
<li>~14 years to build and bring into operation</li>
</ul>
<h3 id="kraftwerk-linthlimmern-pump-storage-extension">Kraftwerk-Linth–Limmern pump-storage extension</h3>
<ul>
<li>1 Pump-storage power plant</li>
<li>Power 1000 MW (Turbines and Pumps)</li>
<li>Storage 33 GWh</li>
<li>Duration: 33 Hours</li>
<li>Round trip efficiency: over 90%</li>
<li>Estimated Price 2.1 billion CHF</li>
<li>~10 years to build and bring into operation</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>The storage cost of the batteries is currently about a factor 4-9x the price of the hydropower plant construction but have the advantage of being available within about 18 months. What remains to be seen is how much battery degradation is a factor in these grid scale battery installations. At least Tesla is offering, from my understanding, a 15 year warranty on the Megapack.</p>
<p>The amount of storage produced by the factory represents more than 1 large hydropower plant per year.</p>
<p>The 39 GWh storage produced by the factory in one year is a huge amount, so much that it would cover around 25% of the <a href="https://www.iea.org/data-and-statistics/charts/battery-storage-capability-by-countries-2020-and-2026">expected total capacity that the IEA planned for the entire world by 2026</a>.</p>
<h2 id="post-scriptum-new-gridscale-batteries-in-europe">Post-scriptum: new gridscale batteries in Europe</h2>
<p>2022.11.22 - <a href="https://www.bbc.com/news/uk-england-humber-63707463">Cottingham: Europe's biggest battery storage system switched on - 196MWh</a></p>
<ul>
<li>Power: ? (my estimate ~50-100 MW)</li>
<li>Storage: 196 MWh</li>
<li>use Tesla's AI software to match energy supply to demand</li>
<li>Commissioning in two stages in December 2022 and March 2023.</li>
<li>Supplier: Tesla</li>
<li>Cost: ? (my estimate $100 million+)</li>
</ul>
<p>As usual, BBC is terribly uninformative about specifications and cost. If we assume 50 Tesla Megapacks, cost should be around $100 million+ and 50 to 100 MW based on the 2h or 4h megapacks. Interesting to see a Tesla system in Europe. I expect many more to come online.</p>
<p>2022.11.07 - <a href="https://www.rwe.com/en/press/rwe-generation/2022-11-07-battery-storage-220-mw-neurath">RWE gives green light for 220-megawatt battery storage system in North Rhine-Westphalia</a></p>
<ul>
<li>Power: 80 + 140 MW = 220 MW</li>
<li>Storage: delivering the required output for over an hour but full capacity not mentioned. 220 MWh to 440MWh.</li>
<li>140 million euros</li>
<li>commissioning in 2024</li>
<li>Supplier: Not mentioned.</li>
</ul>
<p>2021.07.22 - <a href="https://www.rwe.com/en/press/rwe-ag/2021-07-22-rwe-builds-one-of-the-largest-battery-storage-facilities-in-germany">RWE bringing 72MW BESS in Germany online in November</a></p>
<ul>
<li>Power: 72 + 45MW = 117 MW</li>
<li>Storage: 128MWh</li>
<li>€50 million</li>
<li>commissioning in end 2022</li>
<li>Supplier: CATL batteries</li>
</ul>
<p>Tesla on Twitter announced: <a href="https://t.co/aw85eHECXI">"Meet Megafactory, our new Megapack factory in Lathrop, CA 🔋🔋🔋"</a></p>http://ewinnington.github.io/posts/sqlite-microstoresEmbracing SQLite and living with micro-services2022-10-22T00:00:00Z<p>The idea of micro-services and their own single purpose data stores is easy to describe. But then to implement and live with it is a different story. So as a developer and architect, I’ve decided to do just that! Make micro-services and micro-data stores to cover the tiny and small stuff in my life I want to keep track of.</p>
<p>As an example, I read online comics, light novels and mangas. I had a continuous list of a couple hundred bookmarks that I tried to keep updated with the last position I was when I read the story. But I always forget to update the bookmark and have so many of them that I lose the last read chapter. My solution?</p>
<p>A SQLite db and some Python code to load it. Pass a single url on some command line Python and it gets added to the Db, a Request goes out, gets the title and chapter from the html, then adds it to the DB by title. Now I have a track of where I left off and I can get have last updated / last read records. Bonus, I can do a SELECT .. ORDER BY updated LIMIT 10 to check the last stories I was reading and pipe them to my browser to open up the chapters where I left them off.</p>
<p>To really embrace SQLite is to make everything in your life become a new micro database, even if there's only a couple of tables with a dozen or a hundred rows.</p>
<p>Stock tracking? an SQLite Db with Transactions and a roll-up Inventory table.</p>
<p>In fact, even when sending data around from one system to another, we should even embrace the simplicity of SQLite over CSV files. See <a href="https://berthub.eu/articles/posts/big-data-storage/">https://berthub.eu/articles/posts/big-data-storage/</a> for his views and performance tests.</p>
<p>Now I have micro data-stores, I can add a service on top which contains the CRUD commands I need to interact with them and show them in a personal dashboard.</p>
<p>The idea of micro-services and their own single purpose data stores is easy to describe. But then to implement and live with it is a different story. So as a developer and architect, I’ve decided to do just that! Make micro-services and micro-data stores to cover the tiny and small stuff in my life I want to keep track of.</p>http://ewinnington.github.io/posts/Starship-laser-ablationThe case for a SpaceX Starship laser ablation platform for orbital debris management2022-09-18T00:00:00Z<p>SpaceX’s Starship program as a platform for specialised load-outs has many potential applications: Tanker variants for orbital refuelling, Crew variant for Dear Moon mission and dedicated satellite launcher for the Starlink satellite constellation deployment being the variants we already know about.</p>
<p>With the latest discussion about Orbital debris fields, I suggest it is time to discuss about another variant: a dedicated laser ablation Starship variant for de-orbiting or destroying 1 to 10 cm sized debris.</p>
<p>To deorbit or destroy debris in Earth’s orbit, laser ablation is one of the ideal techniques to use since it can directly burn up small debris or deorbit larger ones via plasma propulsion (in effect, burning up the target object and causing it to be propelled by the plasma generated by the laser hitting the target).</p>
<p>To be able to make orbital cleanup affordable, we need to have a cheap to fly, high power laser with a sufficient burn time and have the ability to choose the orbit of our laser platform.</p>
<ul>
<li>Cheap to fly: The launch costs of a SpaceX starship is estimated to be at least one order of magnitude less in dollar per kilo to orbit.</li>
<li>High power laser: The high power of chemical lasers along with their high mass requirements make them a good fit for integration into a Starship. With sufficient mass for the chemicals to supply the laser, the lasers could be used long enough to clear the coplanar orbit. Once the chemicals are depleted, the starship can be landed and the laser refuelled for another mission. Other types of lasers, which have been developed recently, are also candidates: Solid state lasers, fiber lasers, diode lasers - with these the mass capability of Starship would be used for large batteries and potentially hydrogen fuel cells to provide enough power for the application.</li>
<li>Ability to choose orbit: Laser ablation is most effective when the platform if shooting from a “same altitude and coplanar” orbit. Different launches are then the most effective way of reaching these orbits to get maximal efficiency.</li>
</ul>
<p>If the <a href="https://www.fastcompany.com/90789865/orbits-act-what-to-know-about-congress-effort-to-clean-up-space-debris">US government decides to start paying for orbital clearing services</a> with the Orbital Clearance with the Orbital Sustainability (ORBITS) Act, it would be a clear case to develop such a Starship variant. Without other incentives, it might still be profitable by selling “deorbiting and clearance services” to other satellite providers. This remains to be seen how much “good citizenship” is to be expected from satellite constructors, launchers and operators.</p>
<h2 id="references-to-laser-ablation-papers">References to laser ablation papers</h2>
<p>Space based -
<a href="https://conference.sdo.esoc.esa.int/proceedings/sdc8/paper/43/SDC8-paper43.pdf">https://conference.sdo.esoc.esa.int/proceedings/sdc8/paper/43/SDC8-paper43.pdf</a></p>
<p>Ground based -
<a href="https://conference.sdo.esoc.esa.int/proceedings/sdc6/paper/29/SDC6-paper29.pdf">https://conference.sdo.esoc.esa.int/proceedings/sdc6/paper/29/SDC6-paper29.pdf</a></p>
<p>Air based Anti ballistic missile chemical laser - a lower powered could be used in Starsjip for de-orbiting, but this shows the feasibility.
<a href="https://minutemanmissile.com/abl.html">https://minutemanmissile.com/abl.html</a></p>
<h2 id="discussion">Discussion</h2>
<p>A <a href="https://www.reddit.com/r/SpaceXLounge/comments/xj8bjh/the_case_for_a_spacex_starship_laser_ablation/">discussion thread on reddit about this post</a> has provided some feedback and the post has been updated.</p>
<p>SpaceX’s Starship program as a platform for specialised load-outs has many potential applications: Tanker variants for orbital refuelling, Crew variant for Dear Moon mission and dedicated satellite launcher for the Starlink satellite constellation deployment being the variants we already know about.</p>http://ewinnington.github.io/posts/Viruses-left-behindWe can leave viruses behind on Earth as we leave the gravity well2022-01-14T00:00:00Z<p>As humanity left for the stars, bacteria hitched a ride with us. They were on us, in us and around us - as much part of us as our own cells. Viruses, on the other hand, due to careful screening programs, quarantines and selective breeding programs of the few animals that took to the stars, were confined to the gravity well of Earth. Without hosts, they could not follow us. We had left our ancient enemy behind.</p>
<p>Within two generations, the space born growing up on the O’Neill colony cylinders - even those who lived at 0.9 to 1g - realised that earth was going to be a forbidden planet for them. They had no defences against the viruses that continued to permeate the planet. A visit to earth required them to live in an isolation suit and sterile quarters - limiting contact with the earthers and fauna.</p>
<p>As humanity left for the stars, bacteria hitched a ride with us. They were on us, in us and around us - as much part of us as our own cells. Viruses, on the other hand, due to careful screening programs, quarantines and selective breeding programs of the few animals that took to the stars, were confined to the gravity well of Earth. Without hosts, they could not follow us. We had left our ancient enemy behind.</p>http://ewinnington.github.io/posts/db-healthChecking for liveness on databases for health checks2020-07-14T00:00:00Z<p>When you just want to check a DB is reachable from your api or code, a health check is used. For the following DBs the simplest query is:</p>
<ul>
<li>Oracle: <code>SELECT 1 FROM dual</code></li>
<li>Postgresql: <code>SELECT 1</code></li>
<li>SQLite: <code>SELECT 1</code></li>
<li>Mysql / MariaDb: <code>SELECT 1</code></li>
<li>Microsoft SQL-Server: <code>SELECT 1</code></li>
</ul>
<p>There is really an odd one on the list.</p>
<p>When you just want to check a DB is reachable from your api or code, a health check is used. For the following DBs the simplest query is:</p>http://ewinnington.github.io/posts/Point-of-exporing-spaceThe point of Space exploration2020-04-18T00:00:00Z<p>In my view, the point of exploration is hope. For a small slice of humanity, hope in a better world is what drives us.</p>
<p>By learning about the composition of the moon, setting up ultra precise reflectors on its surface to map its motion and exploring it, we are learning about our solar system and the universe, as well as the challenges we will face in the centuries to come.</p>
<p>Humanity needs to expand beyond earth to realize how precious earth is. Every person who has been to space has come back profoundly changed and humbled by the experience. A little distance is needed to appreciate the view and realize how lucky we all are.</p>
<p>The science benefits us all: Solar panels and microwaves to medical innovations. Theory of ecology in closed systems, advanced recycling and so many more.</p>
<p>Mars and the moon are but a stepping stone to the solar system. And hopefully to the milky way and beyond. We need to answer the question "Is there life out there?", or at least I do.</p>
<p>In my view, the point of exploration is hope. For a small slice of humanity, hope in a better world is what drives us.</p>http://ewinnington.github.io/posts/jupyter-tips-csharpTips and tricks for C# Jupyter notebook2019-11-17T01:30:00Z<p>As I use the Jupyter notebook with C# integration, I'll add to this list of tricks as I discover them.</p>
<h2 id="adding-collapsed-details-to-jupyter-notebooks">Adding collapsed details to jupyter notebooks</h2>
<p>I wanted to show hints and answers to questions in the jupyter notebook, but I didn't want them to be directly visible without interaction.</p>
<p><img src="/posts/images/jupyter-notebook-tips/collapsed_details.png" class="img-fluid" alt="collapsed" /></p>
<p><img src="/posts/images/jupyter-notebook-tips/expanded_details.png" class="img-fluid" alt="expanded" /></p>
<p>This was done by creating a markdown block with an HTML section inside. When interacted with, the section expands.</p>
<pre><code class="language-HTML"><details>
<summary>Summary of the fold</summary>
... Content shown after the fold ...
</details>
</code></pre>
<details>
<summary>Summary of the fold</summary>
... Content shown after the fold ...
</details>
<p>And since it's just HTML, it works anywere, including here.</p>
<h2 id="display-function">display() function</h2>
<p>The C# kernel embedded in Jupyter has some functions that control the output to the cells of the notebook when executed. One of these functions is <code>display()</code>.</p>
<h3 id="tables">Tables</h3>
<p>It shows properties of objects or lists in a table format. In my linear programming solver examples, I used this command to make a table with the name and value of each problem variable.</p>
<pre><code class="language-CSharp"> display(solver.variables().Select(a => new { Name = a.Name(), Value = a.SolutionValue() }));
</code></pre>
<p><img src="/posts/images/lp/10-6.png" class="img-fluid" alt="var_table" /></p>
<p>This is an ideal case for <a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/anonymous-types">C#'s anonymous objects</a> and <a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/data-transformations-with-linq">linq to select the object properties</a> that you want to see in the table. Adding <code>using System.Linq;</code> to your references will allow you to use the <code>.Select( x => new { PropertyHeader = x.PropertyValue, ... })</code> linq command.</p>
<h3 id="svgs">SVGs</h3>
<p>I discovered the <code>display(HTML())</code> function allows you to pass items such as SVGs from inside C# to be rendered below the block. Two examples are the SVG image triange and the bar chart. On top of that, the bar chart is stylable by CSS.</p>
<p><img src="/posts/images/jupyter-notebook-tips/svg_img.png" class="img-fluid" alt="expanded" /></p>
<p>The SVG code is generated inside C#, then the variable containing the SVG is displayed.</p>
<p><img src="/posts/images/jupyter-notebook-tips/bar_chart.png" class="img-fluid" alt="expanded" /></p>
<p>Finally, you can add create a cell containing some CSS (with <code>%%html</code>) , which when executed will change the chart as shown.
<img src="/posts/images/jupyter-notebook-tips/bar_chart_with_css.png" class="img-fluid" alt="expanded" />
<img src="/posts/images/jupyter-notebook-tips/bar_chart_css.png" class="img-fluid" alt="expanded" /></p>
<h2 id="magic">Magic</h2>
<p>The <code>%lsmagic</code> command shows a list of commands available to use in code blocks.</p>
<p><img src="/posts/images/jupyter-notebook-tips/lsmagic.png" class="img-fluid" alt="expanded" /></p>
<h3 id="javascript">%%javascript</h3>
<p>You can execute javascript with the <code>%%javascript</code> magic.</p>
<p><img src="/posts/images/jupyter-notebook-tips/javascript_hello.png" class="img-fluid" alt="collapsed" /></p>
<p><img src="/posts/images/jupyter-notebook-tips/javascript_hello_active.png" class="img-fluid" alt="collapsed" /></p>
<h3 id="html">%%html</h3>
<p>As previously mentioned, the %%html command will allow you to add CSS blocks and any other HTML output directly into the Jupyter output cell.</p>
<h3 id="time">%%time</h3>
<p>Gives you access to the wall time. Actual time took to execution of the cell.</p>
<h3 id="whos">%whos</h3>
<p>Gives you access to the list of currently defined variable in the memory of the notebook. Also lists the type and value (as a "toString() representation) of the objects.</p>
<h3 id="fsharp">%%fsharp</h3>
<p>You can intermingle C# and F# code inside one notebook, as long as they are in different cells. You can switch to F# by prefixing the cell with <code>%%fsharp</code></p>
<pre><code class="language-%%fsharp">let main argv =
display "Hello World from F#!"
0
main()
</code></pre>
<p><img src="/posts/images/jupyter-notebook-tips/f_sharp.png" class="img-fluid" alt="collapsed" /></p>
<h2 id="graphing-from-c">Graphing from C#</h2>
<p>We can chart using <code>Xplot.Plotly</code>.</p>
<pre><code class="language-CSharp">//Install XPlot package
#r "nuget:XPlot.Plotly,2.0.0"
using XPlot.Plotly;
</code></pre>
<p>Input data:</p>
<pre><code class="language-CSharp">DateTime now = DateTime.Now;
var imax = 8760;
var rand = new Random();
double[] data = Enumerable.Range(1, imax).Select(x => 20.0 + 15.0 * Math.Sin(x/60.0) + 12 * rand.NextDouble()).ToArray();
DateTime[] tp = Enumerable.Range(1, imax).Select(x => now.AddHours(x) ).ToArray();
int totalNumberForBarChart = 3;
double[] actualFares = new [] {3.4, 12.3, 20.42};
double[] predictionFares = new [] {7.4, 14.3, 18.42};
int[] elements = Enumerable.Range(0, totalNumberForBarChart).ToArray();
</code></pre>
<p>my samples are here <a href="https://github.com/ewinnington/noteb/blob/master/Charts_CSharp.ipynb">C# charts</a></p>
<h3 id="two-bar-plot">Two bar plot</h3>
<p><img src="/posts/images/jupyter-notebook-tips/bar.png" class="img-fluid" alt="bar" /></p>
<pre><code class="language-CSharp">// Define group for Actual values
var ActualValuesGroupBarGraph = new Graph.Bar()
{
x = elements,
y = actualFares,
name = "Actual"
};
// Define group for Prediction values
var PredictionValuesGroupBarGraph = new Graph.Bar()
{
x = elements,
y = predictionFares,
name = "Predicted"
};
var chart = Chart.Plot(new[] {ActualValuesGroupBarGraph, PredictionValuesGroupBarGraph});
var layout = new Layout.Layout(){barmode = "group", title="Actual fares vs. Predicted fares Comparison"};
chart.WithLayout(layout);
chart.WithXTitle("Cases");
chart.WithYTitle("Fare");
chart.WithLegend(true);
chart.Width = 700;
chart.Height = 400;
display(chart);
</code></pre>
<h3 id="histogram">Histogram</h3>
<p><img src="/posts/images/jupyter-notebook-tips/histogram.png" class="img-fluid" alt="hist" /></p>
<pre><code class="language-CSharp">var faresHistogram = Chart.Plot(new Graph.Histogram(){x = data, autobinx = false, nbinsx = 20});
var layout = new Layout.Layout(){title="Distribution of taxi trips per cost"};
faresHistogram.WithLayout(layout);
display(faresHistogram);
</code></pre>
<h3 id="scatter">Scatter</h3>
<p><img src="/posts/images/jupyter-notebook-tips/scatter.png" class="img-fluid" alt="scatter" /></p>
<pre><code class="language-CSharp">var chart = Chart.Plot(
new Graph.Scatter()
{
x = actualFares,
y = predictionFares,
mode = "markers",
marker = new Graph.Marker()
{
color = predictionFares,
colorscale = "Jet"
}
}
);
var layout = new Layout.Layout(){title="Plot Time vs. Distance & color scale on Fares"};
chart.WithLayout(layout);
chart.Width = 500;
chart.Height = 500;
chart.WithLegend(true);
display(chart);
</code></pre>
<h3 id="line-chart-with-scatter">Line chart with scatter</h3>
<p><img src="/posts/images/jupyter-notebook-tips/lines.png" class="img-fluid" alt="lines" /></p>
<pre><code class="language-CSharp">var linedUp = new Graph.Scatter()
{
x = tp,
y = data,
mode = "lines",
};
var chart = Chart.Plot(linedUp);
chart.WithXTitle("Date");
chart.WithYTitle("Value");
chart.WithLegend(true);
chart.Width = 1200;
chart.Height = 600;
display(chart);
</code></pre>
<p>As I use the Jupyter notebook with C# integration, I'll add to this list of tricks as I discover them.</p>http://ewinnington.github.io/posts/jupyter-lp-20Applications in LP and MILP with C# and OR-Tools inside Jupyter2019-11-16T23:30:00Z<p>Thanks to the integration of C# into <a href="https://jupyter.org/">Jupyter notebooks</a> with the <a href="https://github.com/dotnet/try">kernel from Donet try</a> and support from the <a href="https://mybinder.org/">MyBinder.org</a> hosting, it's easy to share with you runnable workbooks to illustrate how to use the <a href="https://developers.google.com/optimization">Google OR-Tools</a> to solve the <a href="https://en.wikipedia.org/wiki/Linear_programming">linear</a> (LP) and <a href="https://en.wikipedia.org/wiki/Integer_programming">mixed-integer linear problems</a> (MILP) .</p>
<h1 id="applications-of-linear-programming">Applications of linear programming</h1>
<h2 id="lp-problem-wire-production">LP Problem : Wire production</h2>
<p>A plant makes aluminium and copper wires. Each Kg of aluminium wire requires 10 kWh of electricity and <span class="math">\(\frac{1}{2}\)</span> hour of labour. Each Kg of Copper wire requires 4 kWh of electricity and $1$ hour of labour. Electricity is limited to 450 kWh/day, labour is limited to 42.5 hours/day at a cost of 11 € an hour, Electricity cost is 20 € / MWh, Aluminium cost is 1.8 €/Kg, Copper cost is 5.4 €/Kg. Total weight delivered to the plant daily is limited to 56 Kg. Aluminium wire sales price is 45 €/Kg, Copper wire sales price is 50 €/Kg.</p>
<p>What should be produced to maximise profit and what is the maximum profit?</p>
<p><a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=Lp_WireProduction.ipynb">Launch the binder</a> <a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=Lp_WireProduction.ipynb"><img src="https://mybinder.org/badge_logo.svg" class="img-fluid" alt="Binder" /></a></p>
<pre><code class="language-CSharp">Solver solver = Solver.CreateSolver("LinearProgramming", "CLP_LINEAR_PROGRAMMING");
//Setting up the variables and constants of the problem
Variable Al = solver.MakeNumVar(0.0, double.PositiveInfinity, "Al");
Variable Cu = solver.MakeNumVar(0.0, double.PositiveInfinity, "Cu");
double El_Price = 20.0; double El_Max = 450;
double La_Price = 11.0; double La_Max = 42.5;
double Weight_Max = 56.0;
double Al_Purchase = 1.8 ; double Al_Sale = 45.0;
double Cu_Purchase = 5.4; double Cu_Sale = 50.0;
// Maximize revenue
Objective objective = solver.Objective();
objective.SetCoefficient(Al, Al_Sale-Al_Purchase-(La_Price * 1/2.0)-(El_Price * 10/1000.0));
objective.SetCoefficient(Cu, Cu_Sale-Cu_Purchase-(La_Price * 1)-(El_Price * 4/1000.0));
objective.SetMaximization();
// Electricity usage limit
Constraint c0 = solver.MakeConstraint(0, El_Max);
c0.SetCoefficient(Al, 10);
c0.SetCoefficient(Cu, 4);
// Labour usage limit
Constraint c1 = solver.MakeConstraint(0, La_Max);
c1.SetCoefficient(Al, 1/2.0);
c1.SetCoefficient(Cu, 1);
// Weight limit
Constraint c2 = solver.MakeConstraint(0, Weight_Max);
c2.SetCoefficient(Al, 1);
c2.SetCoefficient(Cu, 1);
SolveAndPrint(solver);
</code></pre>
<h2 id="milp-problem-knapsack">MILP Problem : Knapsack</h2>
<p>The <a href="https://en.wikipedia.org/wiki/Knapsack_problem">knapsack problem</a> defines a bag that was a maximal weight of <span class="math">\(W\)</span>, we can take items from a set of items each with a weight of <span class="math">\(w_i\)</span> and a value of <span class="math">\(v_i\)</span>. Typically, the problem is defined with an <span class="math">\(x_i \in \{0,1\}\)</span> variable set to either 0 or 1 knapsack where each item is either taken or not.</p>
<p>Here we are going to allow fractional parts of the items to be taken so that we can solve it as a linear problem (also known as a linear relaxation), allowing <span class="math">\(x_i \in [0,1]\)</span>.</p>
<p><a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=Knapsack_Lp_Milp.ipynb">Launch the binder</a> <a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=Knapsack_Lp_Milp.ipynb"><img src="https://mybinder.org/badge_logo.svg" class="img-fluid" alt="Binder" /></a></p>
<pre><code class="language-CSharp">//Array of items, weights and totals
int nItems = 10;
double maxWeight = 220;
double[] weights = {31, 27, 12, 39, 2, 69, 66, 29, 45, 58};
double[] values = {24, 27, 26, 15, 19, 33, 30, 28, 65, 42};
Solver milp_solver = Solver.CreateSolver("MILP", "CBC_MIXED_INTEGER_PROGRAMMING");
Variable[] Items = milp_solver.MakeBoolVarArray(10, "Items");
// Maximize revenue
Objective objective = milp_solver.Objective();
for(int i = 0; i < nItems; i++) objective.SetCoefficient(Items[i], values[i]);
objective.SetMaximization();
// Weight limit
Constraint c0 = milp_solver.MakeConstraint(0, maxWeight);
for(int i = 0; i < nItems; i++) c0.SetCoefficient(Items[i], weights[i]);
SolveAndPrint(milp_solver, nItems, weights);
</code></pre>
<p>There are many ways of solving the knapsack problem, using LP and MILP solvers as seen here or using Dynamic Programming. The Google OR-Tools have a specific solver for multi-dimensional knapsack problems, including one which uses Dynamic Programming.</p>
<p>Thanks to the integration of C# into <a href="https://jupyter.org/">Jupyter notebooks</a> with the <a href="https://github.com/dotnet/try">kernel from Donet try</a> and support from the <a href="https://mybinder.org/">MyBinder.org</a> hosting, it's easy to share with you runnable workbooks to illustrate how to use the <a href="https://developers.google.com/optimization">Google OR-Tools</a> to solve the <a href="https://en.wikipedia.org/wiki/Linear_programming">linear</a> (LP) and <a href="https://en.wikipedia.org/wiki/Integer_programming">mixed-integer linear problems</a> (MILP) .</p>http://ewinnington.github.io/posts/my-binder-jupyter-csharpHosting your C# Jupyter notebook online by adding one file to your repo2019-11-14T23:20:00Z<p><a href="https://mybinder.org/">MyBinder.org</a> in collaboration with <a href="https://github.com/dotnet/try">Dotnet try</a> allows you to host your .net notebooks online.</p>
<p><a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=SqliteInteraction.ipynb">SQLite example workbook: </a>
<a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=SqliteInteraction.ipynb"><img src="https://mybinder.org/badge_logo.svg" class="img-fluid" alt="Binder" /></a></p>
<p>To light up this for your own hosted repositories, you will need a public github repo. Inside the repository, you will need to create a <a href="https://www.docker.com/">Docker</a> file that gives the setup required for MyBinder to setup the environment of the workbook.</p>
<p>The <a href="https://github.com/dotnet/try/blob/master/CreateBinder.md">dotnet/try</a> has the set of instrunctions.</p>
<p>For my repository, I used the following <a href="https://github.com/ewinnington/noteb/blob/master/Dockerfile">Dockerfile</a></p>
<p>A list of my changes to the standard one proposed by dotnet/try:</p>
<ul>
<li>I used a fixed docker image <code>jupyter/scipy-notebook:45f07a14b422</code></li>
<li>Since I have all my notebooks in the root of my repository I did <code>COPY . ${HOME}/Notebooks/</code></li>
<li>Since I am always importing the Nuget files at the top of my workbook, I did not need to have the docker deamon add a nuget config. So I commented out the COPY command <code># COPY ./NuGet.config ${HOME}/nuget.config</code></li>
<li>I commented out the custom <code>--add-source "https://dotnet.myget.org/F/dotnet-try/api/v3/index.json"</code> from the installation of the dotnet try tool, since I had issue with the nuget feed with the pre-release version. Installing with <code>RUN dotnet tool install -g dotnet-try</code> will get you the latest released version.</li>
</ul>
<pre><code class="language-Skip">FROM jupyter/scipy-notebook:45f07a14b422
# Install .NET CLI dependencies
ARG NB_USER=jovyan
ARG NB_UID=1000
ENV USER ${NB_USER}
ENV NB_UID ${NB_UID}
ENV HOME /home/${NB_USER}
WORKDIR ${HOME}
USER root
RUN apt-get update
RUN apt-get install -y curl
# Install .NET CLI dependencies
RUN apt-get install -y --no-install-recommends \
libc6 \
libgcc1 \
libgssapi-krb5-2 \
libicu60 \
libssl1.1 \
libstdc++6 \
zlib1g
RUN rm -rf /var/lib/apt/lists/*
# Install .NET Core SDK
ENV DOTNET_SDK_VERSION 3.0.100
RUN curl -SL --output dotnet.tar.gz https://dotnetcli.blob.core.windows.net/dotnet/Sdk/$DOTNET_SDK_VERSION/dotnet-sdk-$DOTNET_SDK_VERSION-linux-x64.tar.gz \
&& dotnet_sha512='766da31f9a0bcfbf0f12c91ea68354eb509ac2111879d55b656f19299c6ea1c005d31460dac7c2a4ef82b3edfea30232c82ba301fb52c0ff268d3e3a1b73d8f7' \
&& echo "$dotnet_sha512 dotnet.tar.gz" | sha512sum -c - \
&& mkdir -p /usr/share/dotnet \
&& tar -zxf dotnet.tar.gz -C /usr/share/dotnet \
&& rm dotnet.tar.gz \
&& ln -s /usr/share/dotnet/dotnet /usr/bin/dotnet
# Enable detection of running in a container
ENV DOTNET_RUNNING_IN_CONTAINER=true \
# Enable correct mode for dotnet watch (only mode supported in a container)
DOTNET_USE_POLLING_FILE_WATCHER=true \
# Skip extraction of XML docs - generally not useful within an image/container - helps performance
NUGET_XMLDOC_MODE=skip \
# Opt out of telemetry until after we install jupyter when building the image, this prevents caching of machine id
DOTNET_TRY_CLI_TELEMETRY_OPTOUT=true
# Trigger first run experience by running arbitrary cmd
RUN dotnet help
# Copy notebooks
COPY . ${HOME}/Notebooks/
# Copy package sources
# COPY ./NuGet.config ${HOME}/nuget.config
RUN chown -R ${NB_UID} ${HOME}
USER ${USER}
# Install Microsoft.DotNet.Interactive
RUN dotnet tool install -g dotnet-try
#--add-source "https://dotnet.myget.org/F/dotnet-try/api/v3/index.json"
ENV PATH="${PATH}:${HOME}/.dotnet/tools"
RUN echo "$PATH"
# Install kernel specs
RUN dotnet try jupyter install
# Enable telemetry once we install jupyter for the image
ENV DOTNET_TRY_CLI_TELEMETRY_OPTOUT=false
# Set root to Notebooks
WORKDIR ${HOME}/Notebooks/
</code></pre>
<p>Once the Dockerfile is in the repository. Head over to <a href="https://mybinder.org/">MyBinder.org</a> and enter the link to your repository. Optionally, you can set an initial ipynb file to start when the link is clicked.</p>
<p><img src="/posts/images/my-binder/Binder-1.png" class="img-fluid" alt="MyBinder" /></p>
<p>When you click "launch", MyBinder will download your repository and start the docker build, very soon you will be able to access your binders online. Fully shareable and totally awesome!</p>
<p><img src="/posts/images/my-binder/Binder-2.png" class="img-fluid" width="60%" alt="SQLite Running" /></p>
<p><a href="https://mybinder.org/">MyBinder.org</a> in collaboration with <a href="https://github.com/dotnet/try">Dotnet try</a> allows you to host your .net notebooks online.</p>http://ewinnington.github.io/posts/jupyter-lp-10Introduction to linear programming with C# and OR-Tools inside Jupyter2019-11-14T21:30:00Z<p><a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=IntroToLP.ipynb"><img src="https://mybinder.org/badge_logo.svg" class="img-fluid" alt="Binder" /></a></p>
<h1 id="linear-programming">Linear programming</h1>
<p><a href="https://en.wikipedia.org/wiki/Linear_programming">Linear programming</a> (LP) is a method to provide an optimal solution to a problem defined by a set of linear constraints. It is very widely applied in engineering and science.</p>
<p>A typical linear programming problem is defined by an Objective function (the target to maximise or minimise) and a set of constraints which limit the solution space.</p>
<h2 id="what-does-an-lp-problem-look-like">What does an LP problem look like?</h2>
<p>We are going to start with a very simple linear problem definition. The first line of the problem describes the objective function, in this the value to maximize. In this problem, we have two variables to optimize x and y. The constraints that limit the problem's space are defined under the subject to section.</p>
<div class="math">
\[
\begin{aligned}
\max (2y+x) \\
\text{subject to:} \\
\qquad x \leq 15 \\
\qquad y \leq 8
\end{aligned}
\]</div>
<p>This very simple maximisation problem has a maximum solution of <span class="math">\(x=15\)</span> and <span class="math">\(y=8\)</span> for an objective value of <span class="math">\(31\)</span> .</p>
<p><img src="/posts/images/lp/A1.png" class="img-fluid" width="100%" alt="A1" /></p>
<p>There exist many linear programming solvers to calculate this optimum. We will be using <a href="https://github.com/coin-or/Clp">Coin-OR project CLP</a> with the <a href="https://developers.google.com/optimization">Google OR-Tools</a> as an interface for C#.</p>
<h2 id="meet-the-solver">Meet the solver</h2>
<p>You can find my <a href="https://github.com/ewinnington/noteb/blob/master/IntroToLP.ipynb">notebook with all the code here</a>. And you can now access the <a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=IntroToLP.ipynb">Jupyter notebooks online, thanks to Binder</a>!</p>
<h3 id="coin-or-clp-with-google-or-tools">Coin-OR<a id="fnref:1" href="#fn:1" class="footnote-ref"><sup>1</sup></a> CLP (with Google OR<a id="fnref:2" href="#fn:1" class="footnote-ref"><sup>1</sup></a>-Tools)</h3>
<p>The <a href="https://www.coin-or.org">Coin-OR project</a> provides high quality solvers for many applications with an open source license. <a href="https://github.com/coin-or/Clp">CLP</a> is the main linear programming solver we will be using until we start adding binary variables, at which point we will start using <a href="https://github.com/coin-or/Cbc">CBC</a>. To call them from C#, we could write out a format that CLP / CBC knows how to read, such as <a href="https://en.wikipedia.org/wiki/MPS_(format)">MPS</a> or we could use a wrapping library to call them directly from C#. We will be focusing on using the Google OR-Tools.</p>
<h4 id="google-or-tools">Google OR-Tools</h4>
<p>The <a href="https://developers.google.com/optimization/">Google OR-Tools</a> provide us with a set of primitives with which to work with so that we can define optimisation problems and allow us to call to various solvers, including CLP, which we will be using. Also the OR-Tools provide routines to write out the problems in MPS and other formats. We will focus on using the OR Tools as soon as the introduction is over.</p>
<p>Adding the Google OR Tools through nuget to the Jupyter notebook with the <code>#r</code> command and <code>using</code> in the cell imports the solver.</p>
<pre><code class="language-CSharp">#r "nuget:Google.OrTools"
using Google.OrTools.LinearSolver;
</code></pre>
<p><img src="/posts/images/lp/10-1.png" class="img-fluid" alt="10-1.png" /></p>
<p>This allows us to create a solver instance as follows, note the constant <code>CLP_LINEAR_PROGRAMMING</code> this tells us which solver we will be using.</p>
<pre><code class="language-CSharp">Solver solver = Solver.CreateSolver("LinearProgramming", "CLP_LINEAR_PROGRAMMING");
</code></pre>
<p>The following code snippet implements the linear program formulation below<a id="fnref:3" href="#fn:2" class="footnote-ref"><sup>2</sup></a>:</p>
<div class="math">
\[
\begin{aligned}
\max (2y+x) \\
\text{subject to:} \\
\qquad x \leq 15 \\
\qquad y \leq 8
\end{aligned}
\]</div>
<p>The solver has been already defined and initalized. We are now defining the variables of the problem, the objective function and the linear constraints that apply.</p>
<pre><code class="language-CSharp">Variable x = solver.MakeNumVar(0.0, double.PositiveInfinity, "x");
Variable y = solver.MakeNumVar(0.0, double.PositiveInfinity, "y");
// Maximize 2*y+x.
Objective objective = solver.Objective();
objective.SetCoefficient(x, 1);
objective.SetCoefficient(y, 2);
objective.SetMaximization();
// 0 <= x <= 15
Constraint c0 = solver.MakeConstraint(0, 15);
c0.SetCoefficient(x, 1);
// 0 <= y <= 8
Constraint c1 = solver.MakeConstraint(0, 8);
c1.SetCoefficient(y, 1);
</code></pre>
<p><img src="/posts/images/lp/10-2.png" class="img-fluid" alt="10-2.png" /></p>
<p>When you execute the next cells, the solver.Solve() function is called and the results will be written out to the cell's output. We will use this function cell several times over the course of the workbook.</p>
<pre><code class="language-CSharp">public void SolveProblem() {
var resultStatus = solver.Solve();
// Check that the problem has an optimal solution.
if (resultStatus != Solver.ResultStatus.OPTIMAL)
{
Console.WriteLine("The problem does not have an optimal solution!");
return;
}
Console.WriteLine("Problem solved in " + solver.WallTime() + " milliseconds");
// The objective value of the solution.
Console.WriteLine("Optimal objective value = " + solver.Objective().Value());
// The value of each variable in the solution.
foreach (var v in solver.variables())
{ Console.WriteLine($"{v.Name()} : {v.SolutionValue()} "); };
}
</code></pre>
<p><img src="/posts/images/lp/10-3.png" class="img-fluid" alt="10-3.png" /></p>
<h3 id="other-solvers">Other solvers</h3>
<p>There exists many other solvers that are available, either directly wrapped through the Google OR Tools such as GLOP or another library or as external programs fed through input files in MPS or other formats. The <a href="https://neos-server.org/neos/">NEOS project</a> provides a thorough set of optimisation solvers that are available online with many input formats available (MPS, AMPL, Lp, GAMS).</p>
<h2 id="problem-2">Problem 2</h2>
<p>We will be adding a third constraint to the formulation of Problem 1.</p>
<div class="math">
\[
\begin{aligned}
\text{Obj:} \max(2y+x)
\\ \text{subject to:} \quad x \leq 15
\\ \qquad \quad \quad \quad y \leq 8
\\ \quad \quad \quad x+y \leq 18
\end{aligned}
\]</div>
<p><img src="/posts/images/lp/A2.png" class="img-fluid" width="100%" alt="A2" /></p>
<p>The solver is initialized with the full problem 1 defintion already, x and y are already declared. When you click run on the solver's cell, the solver.Solve() function is called and results are written out.</p>
<pre><code class="language-CSharp">Constraint c = solver.MakeConstraint(0, 18);
c.SetCoefficient(x, 1);
c.SetCoefficient(y, 1);
</code></pre>
<p>If you call <code>SolveProblem();</code>, you will now have a new optimal value that is lower that the previous one. In linear programming maximization problem, adding a new constraint will always make the optimal value lower or equal to the less constrained. problem</p>
<p><img src="/posts/images/lp/10-4.png" class="img-fluid" alt="10-4.png" /></p>
<h2 id="problem-3">Problem 3</h2>
<p>You should now be able to add an adding a fourth constraint to the formulation of Problem 2.</p>
<div class="math">
\[
\begin{aligned}
\text{Obj:} \max(2y+x)
\\ \text{subject to:} \qquad x \leq 15
\\ \qquad \quad y \leq 8
\\ \quad x+y \leq 18
\\ -\frac{1}{3}x+y \leq 2
\end{aligned}
\]</div>
<p><img src="/posts/images/lp/A3.png" class="img-fluid" width="100%" alt="A3" /></p>
<p>The solver is initialized with the full problem 2 defintion already, x and y are already declared. When you click on the solver's cell, the solver.Solve() function is called and results are written out.</p>
<p><img src="/posts/images/lp/10-5.png" class="img-fluid" width="100%" alt="10-5.png" /></p>
<p>For those following along in the <a href="https://github.com/ewinnington/noteb/blob/master/IntroToLP.ipynb">notebook</a> there is a hint below this section with a second implementation of the <code>SolveProblem()</code> function which should give you hints based on your objective value.</p>
<h2 id="the-power-of-jupyter-special-commands">The power of Jupyter special commands</h2>
<p>I'm discovering slowly the jupyter commands. The first command <code>display()</code> allows you to present objects in a table. A great way to select the properties you want to show is to use linq (remember to add <code>using System.Linq</code>) to map a list of objects to an anonymous object with the properties you want to show in the table.</p>
<pre><code>display(solver.variables().Select(a => new { Name = a.Name(), Value = a.SolutionValue() }));
</code></pre>
<p><img src="/posts/images/lp/10-6.png" class="img-fluid" width="100%" alt="10-6.png" /></p>
<h2 id="recap">Recap</h2>
<p>Now that we have introduced linear programming and know how to use the solver, the following chapter will cover two simple linear programming applications.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p>OR here refers to <a href="https://en.wikipedia.org/wiki/Operations_research">Operational Research</a> - the field of mathematics dedicated to the search for optimal or near optimal solutions to problems.<a href="#fnref:1" class="footnote-back-ref">↩</a><a href="#fnref:2" class="footnote-back-ref">↩</a></p>
</li>
<li id="fn:2">
<p>For those using <a href="/posts/Switching-to-wyam">Wyam to generate their blogs</a>, you can add the</p>
<pre><code class="language-HTML"><script type="text/javascript" id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
</script>
</code></pre>
<p>to your <code>_Head.cshtml</code> template page. I recommend checking out <a href="https://www.mathjax.org/">MathJax</a> for the latest <a href="http://docs.mathjax.org/en/latest/web/start.html#using-mathjax-from-a-content-delivery-network-cdn">CDN</a>.<a href="#fnref:3" class="footnote-back-ref">↩</a></p>
</li>
</ol>
</div>
<p><a href="https://mybinder.org/v2/gh/ewinnington/noteb/master?filepath=IntroToLP.ipynb"><img src="https://mybinder.org/badge_logo.svg" class="img-fluid" alt="Binder"></a></p>http://ewinnington.github.io/posts/jupyter-docker-csharp-postgresDocker controlled from Jupyter Notebook C# with PostgresDB2019-11-12T23:18:00Z<p>In the context of Docker and Jupyter Notebook, it's interesting to note that there exists a Nuget that allows C# to control docker. So, yes, it is possible to launch a Postgresql database, on docker, inside a Jupyter notebook!</p>
<p>This assumes you have <a href="https://hub.docker.com/?overlay=onboarding">Docker</a>, <a href="https://github.com/dotnet/try/blob/master/DotNetTryLocal.md">Dotnet try</a>, <a href="https://jupyter.org/">Jupyter notebook</a> and follow the setup of the <a href="/posts/jupyter-notebook-csharp-r">C# kernel for Jupyter</a>.</p>
<p>If you don't want to wait, you can find my <a href="https://github.com/ewinnington/noteb/blob/master/DockerInteraction.ipynb">complete notebook here</a>.</p>
<p>Microsoft has created a <a href="https://github.com/microsoft/Docker.DotNet">C# Client library for talking to Docker</a>, so we will be taking advantage of it. Much of the magic docker code is pulled from the Docker.DotNet repository.
I'm using the <a href="https://www.npgsql.org/index.html">Npgsql drivers</a> for accessing the PostgreSQL database.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-01.png" class="img-fluid" alt="pgsql01" /></p>
<p>The real magic moment is when you access the Docker instance, if it is on your local machine on windows, you can use the <code>npipe://./pipe/docker_engine</code> Uri. If you are on Linux, use <code>unix:///var/run/docker.sock</code> (at this time, I haven't tried it on linux, but if you do, please tell me).
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-02.png" class="img-fluid" alt="pgsql02" /></p>
<p>In block 3, we select a random port to host the pgSQL database. Then list the local image names that are available (you should get postgres:latest on your machine to run this). We create and start up the container, passing the environment variables for the password, user and initial schema. Once the container is started, we detach ourselves from it, so it runs in the background. Finally, I wait until I'm pretty sure the container and database is ready (10s sleep at the end). You can reduce that sleep time. On my Surface laptop 1, I sometimes have an issue when I've got too many other containers running.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-03.png" class="img-fluid" alt="pgsql03" /></p>
<p>I'm connecting to the database and validating the connection name.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-04.png" class="img-fluid" alt="pgsql04" /></p>
<p>I'm creating a database schema and a user for this particular schema, then reconning with the new user.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-05.png" class="img-fluid" alt="pgsql05" /></p>
<p>Creating a table and inserting two rows using direct strings and string concatenation. In production, you should never be using string concatenation for your SQL statements. Please always use Bound variables as described below.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-06.png" class="img-fluid" alt="pgsql06" /></p>
<p>This is how you should interact with the PostgreSQL database if you are using direct SQL statements. You should be using Bound commands with parameters.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-07.png" class="img-fluid" alt="pgsql07" /></p>
<p>Finally, I check that all three insertions were successful.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-08.png" class="img-fluid" alt="pgsql08" /></p>
<p>You can watch the container run in the <a href="https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-docker">docker extension</a> of <a href="https://code.visualstudio.com/">Visual studio Code</a>. It's a great way of monitoring what is currently running, as well as deleting old containers that might be still present.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-10.png" class="img-fluid" alt="pgsql10" /></p>
<p>Talking of deleting old containers, this is how you shut them down and delete them at the end of the notebook. I first close the Db connection and dispose of it before asking Docker.DotNet to stop the containers.
<img src="/posts/images/jupyter-notebook-csharp-r/docker-pg-09.png" class="img-fluid" alt="pgsql09" /></p>
<p>It would be cleaner if I knew how to enfore a <code>Finally</code> in Jupyter Notebook, but at this time, I don't know. If you do, drop me a line on <a href="https://twitter.com/ThrowATwit">twitter</a> or a pull request on this blog post.</p>
<p>In the context of Docker and Jupyter Notebook, it's interesting to note that there exists a Nuget that allows C# to control docker. So, yes, it is possible to launch a Postgresql database, on docker, inside a Jupyter notebook!</p>