Merkle "Proofs of Inclusion"

Motivation

We would like to be able to create proofs that a transaction was indirectly referenced by another transaction without having to provide the full chain of actual transactions between the two.

Solution: Merkle Proofs

From the perspective of a transaction, the tangle looks “like a binary tree” where every parent references two children (with the exception that here, a parent can reference the same child twice). The easiest and most straight forward way to achieve a “proof of inclusion” in such a datastructure, would most probably be a merkle tree like approach, where a sequence of hashes can be used to prove inclusion in the referenced subtangle.

In contrast to normal merkle trees, where only the leafs of the tree carry payload in the form of data, here every node (transaction) has a payload which contains the transaction specific details. To be able to anyway use a chain of hashes instead of the actualy transaction data to “proove inclusion”, we need to modify the way the transaction hash is caluclated. We first define the “payloadFingerprint” of a transaction to be the hash of all its trits excluding the trits used to encode its branch and trunk:

payloadFingerprint(tx) = hash(tx.fullTritsWithoutBranchAndTrunk)

Then we can use this payloadFingerprint in combination with the branch and trunk hashes to form the actual “transactionHash”:

transactionHash(tx) = hash(payloadFingerprint(tx) + tx.branchTransactionHash + tx.trunkTransactionHash)

This way, we can provide a chain of hashes instead of the actual data to derive a transaction hash and therefore use a similar approach as in merkle trees to prove that a transaction was referenced by another transaction.

The only downside of this approach is that we need to hash twice, to calculate the transaction hash of a transaction. Since hashing is an expensive task, I would prefer to find a more efficient way to combine the three hashes (payloadFingerprint, branch, trunk) into a unique transaction hash.

In the binary world, the XORing of hashes is a commonly used mechanism to create such a new unique hash. XORing two numbers with roughly random distribution results in another number still with roughly random distribution, but which now depends on the two values. This is due to the fact, that the output probability for the XOR function is 50% “0s” and 50% “1s” - see example of the LUT:

A   B   AxB
-----------
0   0   0
0   1   1
1   0   1
1   1   0

The XORing of hashes can be problematic because of the following special cases:

  • H(a) x H(a) = 0

  • H(a) x H(b) = H(b) x H(a)

  • H(a) x H(a) x H(a) = H(a)

Since we have a well defined number of operands, the commutative nature of the XOR operation is not problematic here, but the fact that H(a) x H(a) = 0 is actually a problem. If branch and trunk reference the same transaction, then XORin their hashes results in a complete loss of information regarding the references.

While this is problematic for the binary version of this approach, we can define a custom XOR-like operation for the trinary world that has an equally balanced percentage of output probabilities, but that does not suffer from H(a) x H(a) = 0 but instead behaves like: H(a) x H(a) = H(a) which retains the full “reference information” in these cases. The LUT for this operation looks like this:

A   B   AxB
-----------
0   0   0
0   1   T
0   T   1
1   0   T
1   1   1
1   T   0
T   0   1
T   1   0
T   T   T

This “custom” ternary XOR operator now allow us to calculate the transactionHash in a faster way than going through a full hashing procedure a 2nd time in the following way:

transactionHash'(tx) = (tx.branchHash \oplus tx.trunkHash) \oplus payloadFingerprint(tx)

Conclusion

This change will be very cheap and will at the same time provide us with a very powerfull tool. We can now provide “proofs of inclusion” between two transactions without having to present the actual transactions. This will be relevant for the discussed solutions for inter-shard transactions and might also get relevant for the “selective permanodes”.

3 Likes

@hans.moog a question: can we use this idea to get rid of that BelowMaxDepth thingy?

That is, if there is a transaction that approves something old (maybe together with something new) but spends valid inputs, we only need proof that this “something old” existed in the Tangle?..

I don’t know about ‘BelowMaxDepth’ as a specific problem. But how AION works right now this would be absolutely a massive improvement for generating proofs. And as @hans.moog status will be very use-full for selective permanodes. I think generally being able to verify the tangle structure independent of the data is a very good property to have indeed.

I can see that can be used in sharding too.
What’s above troika? could be re-written by using this too?

Currently, I am thinking about changes to compass and I find this useful.

I have a question/observation (depends if I understood how to create the proof):
If I understand correctly, the payload finger prints hashes are fundamental to creating the proof.
So you have to relay them as part of the proof (correct me if I misunderstood please :slight_smile: ).

So it looks like we will have a larger/harder proof than in a classic merkle tree because we will have to send all payload fingerprints as well. We might even have to recalculate them if we drop them after we calculate the full hash (which will be heavy).

In a bitcoin proof of inclusion the tree seems larger because all txs are leaves. But this doesn’t affect the proof size much because it is a function of the height.

So currently I am thinking that for compass mainnet (Again sorry for polluting the coordicide talk. But lately lots of coordicide idea are dripping to mainnet so I need to respond here) maybe we can add a bitcoin-like merkle-root. This way maybe we don’t have to do the breaking change in hash calculation :slight_smile:

For coordicide where we don’t have milestones and we are creating a network from scratch, my idea is not relevant, and I support the original proposal.

I also want to add a feature to this thread which I think is very important if IOTA plans to play nice with regulation. With proof of inclusion we can create valid tangle state and proofs but we can omit the actual data. The reason to omit data can of course vary greatly. But I doubt that in practice a lot of real requests to omit data will come but having the ability to do so (especially in permanodes) will greatly help with adopting IOTA as a standard. Especially if we talk about Digital Identity on the tangle. And especially because of the general notion this is impossible with blockchain/DLT technology.

However not a technical reason, but a very strong use-case nonetheless.

I was just made aware of this proposal. If we give IOTA the capability to remove parts of transactions which might contain personal identifiable information such as the signatureMessageFragment and the Tag on request, this would be absolutely HUGE for our positioning for digital identity and in general towards governments. If we have an API that allows the user “the right to be forgotten” I would have such a strong pitch for the Unified Identity Protocol. I can’t reemphasize enough how important such a feature would be towards governments. Obviously it would at some point need to support a feature which sends a request across the network to forget the transaction details to actually conform to GDPR, but even the promise of that in the future with backwards compatibility would be amazing.

The responsibility for the data will be moved from IOTA as a network to individual node operators. If they decide to break GDPR laws by not forgetting something after a request for deletion, they could be facing a GDPR fine, however this removes the responsibility of the network which is good for IOTA.

Are you still working on trying to make a non silly way of combining three hashes? If so, here is a possible answer: each byte represented a number 0 through 15. Then bytes can be added modulo 16 using modular arithmetic (sum the numbers, divide by 16, and take the remainder). You can then “add” the hashes bytewise in this manner.

However, this is a very “reversible” function. We need to check crptographically that an attacker cannot exploit this to manufacture merkel proofs.

No, with binary hashing we now don’t have to worry about hashing performance anymore and can use a standard merkle tree, which is formed by concatenating the input bytes).

One less problem to worry about!

this idea is absolutelly necessary, nearly every supplychain usecase has to store data for a certain period. Imagine my Article is at least 6 months in a warehouse, and the tangle processes constantly about 1000 TPS. There is absollutelly no chance to hold the supplychain data for validation. If i do this on a selective permanode, without any proof of inclusion, this wouldnt be different from a local storage because the lack of other parties who stored this bundle for validation. (Its also a big problem today for developing usecases, because its simply too expencive to run a permanode for some years) especially because noone can calculate the amount of transactions / storage will be needed in the next view months. it is pure gamble whether you can keep the data.

I would say, we need something like this ASAP