As a worldwide supercomputer, Golem faces its fair share of challenges. In this post, we will present some of the more difficult ones, and suggest a few ways that we might overcome them.
Creating a network
Our fundamental requirements for Golem’s network are scalability and resilience. Here, peer-to-peer (P2P) network methodologies offer a few great advantages:
- Decentralized nodes are more efficient when directly communicating.
- With no dedicated infrastructure to maintain, censorship and meddling becomes fundamentally more difficult.
Currently, the Golem network consists of two separate layers: a P2P infrastructure network, and a task network. “P2P” maintains connections between nodes, and is responsible for transmitting information on both tasks and node reputation. The task network on the other hand, exchanges messages regarding task participation, results, and payments.
Golem’s network is secured by tried-and-true methods: elliptic-curve based encryption is enabled end-to-end, peer identities are known, and their digital signatures are verified. Overall security and communication protocols were heavily influenced by ÐΞVp2p. We’re constantly working to improve direct communication between nodes with important features like NAT traversal for IPv4, and fallback message relaying.
In addition to forming a reliable and performant P2P network infrastructure, Golem often needs to distribute very large resource bundles. IPFS is a decentralized, content-addressable peer-to-peer sharing network, where the download process resembles BitTorrent: a “swarm” of peers exchanges small parts of the same file. We’re planning to use IPFS as the primary exchange method for all Golem resource bundles. With this system in place, all participating Golem nodes can both upload and download, unburdening the requesting node from being force to send large chunks of data multiple times.
Task definition and verification
Golem requires a foundation suitable for creating distributed tasks of general and specific applications (e.g. rendering vs a custom application). A task definition framework and related API calls are being developed in order to streamline the creation process. By providing templates for common applications, this system will help users define task parameters and prerequisites, break tasks down to subtasks, and create verification and result collection processes. To start the process off, we will provide a few domain-specific solutions for verification (e.g. rendering), and a more generic one for redundant computation. Caveat emptor: Automatic verification may not be feasible for all types of tasks, but the task definition framework will allow custom tasks to be tested.
Topological sorting — effectively job ordering and scheduling — in the P2P layer may be implemented at some point in the future. This will enable the creation of co-dependent tasks ala MapReduce: processing computations in parallel on multiple nodes and summarizing each set of results, also in parallel. This technique is used broadly in big data processing via Apache Hadoop and others, but Golem’s implementation will be a unique feat for P2P networks as far as we know.
Despite the trend of creating new blockchains, we didn’t really see this additional complexity as necessary for our purposes. In our research on the current state of blockchain technologies, it was clear that Ethereum’s smart contracts system would allow us to both implement the GNT token and expand its possibilities in the future.
Currently in Golem, the remunerations (“nanopayments”) are small, and transaction (gas) fees are relatively high. For this reason, we collect the transactions in batches. You can find out more about this in Paweł’s post:
And of course, the details of the nanopayments implementation can be found in the nanopayments whitepaper:
Establishing a level of trust between nodes aids computational efficiency, and Golem favors nodes that have a reputation for being both fast and relatively unfailing. In decentralized networks, the problem lies in building a global reputation score without an infallible central authority. This score is usually based on the neighboring nodes’ scores, which takes time to propagate and converge into one final value for each node in the network.
Such reputation systems are vulnerable to numerous attacks. There is the famous sybil attack, where the attacker creates a large number of identities and fakes those scores via a large influence on the network. There is also the re-entry attack, where the attacker starts fresh with a “neutral” reputation by abandoning the old identity. So far, we are aware of no system resilient to all attacks. Reputation systems in decentralized applications remain an open problem. Golem’s reputation system too is still a subject of research, and we plan to cooperate with the best teams in the field to come up with the best solutions.
Executing untrusted code
Though Golem’s reputation system will help in identifying malicious and troublesome nodes, other peers can still act against expectations by submitting a task that contains harmful code. In order to protect the providers, every computation has to be executed in a controlled environment.
Thankfully, there are tools at hand, among which Docker containers were our first choice. Docker containers isolate running applications, files, and hardware resources, with a performance hit significantly smaller than traditional virtual machines (VMs). In Golem’s implementation, container capabilities have been minimized as much as possible. By constraining an executed program’s privileges, we were able to further increase their level of security. Not all tasks have equal prerequisites, however: Some computations may be performed on a GPU, while others will require network access. The constraints imposed on containers will therefore vary. In the future, a marketplace for pre-configured and digitally signed containers may be created, providing a suitable “whitelist” and improvement to overall security.
And of course, we can still do more. The most promising solution seems to be the creation of a specialized language: Designed with security and simplicity in mind, and armed with carefully designed static analysis tools, it could provide an ideal way to create “safe” applications for deployment on Golem.
We recognize that any sensitive or proprietary data should be respected as private at all costs. Therefore, Golem needs to provide a way of distributing task resources such that it is very difficult to copy, reproduce, or otherwise gain access to their contents.
Of course, it is well nigh impossible to remotely restrict access to locally running software; we can simply make it such that the effort needed to extract private data should be substantial enough to altogether discourage such actions. We believe we will be most effective in this endeavor by ensuring the following:
- That data and work is sufficiently broken into chunks by the task definition framework.
- That said tasks and data are distributed as equally across the most reliable parts of the network as possible, rather than going to the same nodes.
- The task definition framework itself helps to identify private data leaks in any newly created tasks.
As we said earlier, Golem will leverage sandboxed environments in the form of Docker containers. The computation environment can be viewed as a black box, which, fed with the encrypted data, outputs encrypted results.
There are of course controversial activities which are difficult to prevent in a P2P network: sharing proprietary data, sharing unlicensed software, and even conducting DDoS attacks would probably all qualify.
Now, let’s say that a Golem node wants to perform such an attack. If its reputation is neutral or better, and the remuneration is reasonable, there may be peers willing to participate in the computation. In such a case, who is responsible and what is to be done?
First off, we believe that computing nodes should not be held liable for the attack. Task assignment is based on available computation environments and hardware capabilities. The context and purpose of executed code is unknown, so the harm caused was obviously involuntary. Submitting a malicious task to the Golem network may also be perceived as exploitation of Golem infrastructure.
As for “solutions”, there is much work and research to be done to prevent such actions. The most promising long-term solution seems to be the custom language and its static analysis tools. Golem could, to an extent, identify potentially harmful code and act according to user’s settings by executing only valid, “safe” code. But at the current development stage, a marketplace for signed and verified containers will probably take us furthest.
Though it may give some idea of the scale and scope of the challenges that Golem must overcome, this post by no means covers them exhaustively. As we work through possible solutions, we will keep you posted at every turn. Golem’s development is an open process, and as always, we welcome feedback from every corner. If you think you have an answer to one or more of our development challenges, or if you just want to talk, join us on Slack!