Encrypted Data Vaults v0.1

We store a significant amount of sensitive data online, such as personally identifying information (PII), trade secrets, family pictures, and customer information. The data that we store is often not protected in an appropriate manner.

This specification describes a privacy-respecting mechanism for storing, indexing, and retrieving encrypted data at a storage provider. It is often useful when an individual or organization wants to protect data in a way that the storage provider cannot view, analyze, aggregate, or resell the data. This approach also ensures that application data is portable and protected from storage provider data breaches.

Introduction

Legislation, such as the General Data Protection Regulation (GDPR), incentivizes service providers to better preserve individuals' privacy, primarily through making the providers liable in the event of a data breach. This liability pressure has revealed a technological gap, whereby providers are often not equipped with technology that can suitably protect their customers. Encrypted Data Vaults fill this gap and provide a variety of other benefits.

Why Do We Need Confidential Storage?

Explain why individuals and organizations that want to protect their privacy, trade secrets, and ensure data portability will benefit from using this technology. Explain how giving a standard API for the storage of user data empowering users to "bring their own storage", giving them control of their own information. Explain how applications that are written against a standard API and assume that users will bring their own storage can separate concerns and focus on the functionality of their application, removing the need to deal with storage infrastructure (instead leaving it to a specialist service provider that is chosen by the user).

Requiring client-side (edge) encryption for all data and metadata at the same time as enabling the user to store data on multiple devices and to share data with others, whilst also having searchable or queryable data, has been historically very difficult to implement in one system. Trade-offs are often made which sacrifice privacy in favor of usability, or vice versa.

Due to a number of maturing technologies and standards, we are hopeful that such trade-offs are no longer necessary, and that it is possible to design a privacy-preserving protocol for encrypted decentralized data storage that has broad practical appeal.

Ecosystem Overview

The problem of decentralized data storage has been approached from various different angles, and personal data stores (PDS), decentralized or otherwise, have a long history in commercial and academic settings. Different approaches have resulted in variations in terminology and architectures. The diagram below shows the types of components that are emerging, and the roles they play. Encrypted Data Vaults fulfill the low-level encrypted storage role.

diagram showing
the roles of different technologies in the encrypted
data vaults and secure data store ecosystem and how they interact. — Confidential Storage layers

This section describes the roles of the core actors and the relationships between them in an ecosystem where this specification is expected to be useful. A role is an abstraction that might be implemented in many different ways. The separation of roles suggests likely interfaces and protocols for standardization. The following roles are introduced in this specification:

data vault controller: A role an entity might perform by creating, managing, and deleting data vaults. This entity is also responsible for granting and revoking authorization to storage agents to the data vaults that are under its control.
storage agent: A role an entity might perform by creating, updating, and deleting data in a data vault. This entity is typically granted authorization to to access a data vault by a data vault controller.
storage provider: A role an entity might perform by providing a raw data storage mechanism to a data vault controller. It is impossible for this entity to see the data that it is storing due to all data being encrypted at rest and in transit to and from the storage provider.

Requirements

The following sections elaborate on the requirements that have been gathered from the core use cases.

Privacy and multi-party encryption

One of the main goals of this system is ensuring the privacy of an entity's data so that it cannot be accessed by unauthorized parties, including the storage provider.

To accomplish this, the data must be encrypted both while it is in transit (being sent over a network) and while it is at rest (on a storage system).

Since data could be shared with more than one entity, it is also necessary for the encryption mechanism to support encrypting data to multiple parties.

Sharing and authorization

It is necessary to have a mechanism that enables authorized sharing of encrypted information among one or more entities.

The system is expected to specify one mandatory authorization scheme, but also allow other alternate authorization schemes. Examples of authorization schemes include OAuth2 and [[ZCAP]]s (Authorization Capabilities).

Identifiers

The system should be identifier agnostic. In general, identifiers that are a form of URN or URL are preferred. While it is presumed that [[DID-CORE]] (Decentralized Identifiers, DIDs) will be used by the system in a few important ways, hard-coding the implementations to DIDs would be an anti-pattern.

Versioning and replication

It is expected that information can be backed up on a continuous basis. For this reason, it is necessary for the system to support at least one mandatory versioning strategy and one mandatory replication strategy, but also allow other alternate versioning and replication strategies.

Metadata and searching

Large volumes of data are expected to be stored using this system, which then need to be efficiently and selectively retrieved. To that end, an encrypted search mechanism is a necessary feature of the system.

It is important for clients to be able to associate metadata with the data such that it can be searched. At the same time, since privacy of both data and metadata is a key requirement, the metadata must be stored in an encrypted state, and service providers must be able to perform those searches in an opaque and privacy-preserving way, without being able to see the metadata.

Protocols

Since this system can reside in a variety of operating environments, it is important that at least one protocol is mandatory, but that other protocols are also allowed by the design. Examples of protocols include HTTP, gRPC, Bluetooth, and various binary on-the-wire protocols. An HTTPS API is defined in .

Design goals

This section elaborates upon a number of guiding principles and design goals that shape Encrypted Data Vaults.

Layered and modular architecture

A layered architectural approach is used to ensure that the foundation for the system is easy to implement while allowing more complex functionality to be layered on top of the lower foundations.

For example, Layer 1 might contain the mandatory features for the most basic system, Layer 2 might contain useful features for most deployments, Layer 3 might contain advanced features needed by a small subset of the ecosystem, and Layer 4 might contain extremely complex features that are needed by a very small subset of the ecosystem.

Prioritize privacy

This system is intended to protect an entity's privacy. When exploring new features, always ask "How would this impact privacy?". New features that negatively impact privacy are expected to undergo extreme scrutiny to determine if the trade-offs are worth the new functionality.

Push implementation complexity to the client

Servers in this system are expected to provide functionality strongly focused on the storage and retrieval of encrypted data. The more a server knows, the greater the risk to the privacy of the entity storing the data, and the more liability the service provider might have for hosting data. In addition, pushing complexity to the client enables service providers to provide stable server-side implementations while innovation can by carried out by clients.

Architecture

Review this section for language that should be properly normative.

This section describes the architecture of the Encrypted Data Vault protocol, in the form of a client-server relationship. The vault is regarded as the server and the client acts as the interface used to interact with the vault.

This architecture is layered in nature, where the foundational layer consists of an operational system with minimal features, and where more advanced features are layered on top. Implementations can choose to implement only the foundational layer, or optionally, additional layers consisting of a richer set of features for more advanced use cases.

Server and client responsibilities

The server is assumed to be of low trust, and must have no visibility into the data that it persists. However, even in this model, the server still has a set of minimum responsibilities it must adhere to.

The client is responsible for providing an interface to the server, with bindings for each relevant protocol (HTTP, RPC, or binary over-the-wire protocols), as required by the implementation.

All encryption and decryption of data is done on the client side, at the edges. The data (including metadata) MUST be opaque to the server, and the architecture is designed to prevent the server from being able to decrypt it.

Layer 1 (L1) responsibilities

Layer 1 consists of a client-server system that is capable of encrypting data in transit and at rest.

Server: validate requests (L1)

When a vault client makes a request to store, query, modify, or delete data in the vault, the server validates the request. Since the actual data and metadata in any given request is encrypted, such validation is necessarily limited and largely depends on the protocol and the semantics of the request.

Server: Persist data (L1)

The mechanism a server uses to persist data, such as storage on a local, networked, or distributed file system, is determined by the implementation. The persistence mechanism is expected to adhere to the common expectations of a data storage provider, such as reliable storage and retrieval of data.

Server: Persist global configuration (L1)

A vault has a global configuration that defines the following properties:

Stream chunk size
Other config metadata

The configuration allows the the client to perform capability discovery regarding things like authorization, protocol, and replication mechanisms that are used by the server.

Server: enforcement of authorization policies (L1)

When a client makes a request to store, query, modify, or delete data in the vault, the server enforces any authorization policy that is associated with the request.

Client: encrypted data chunking (L1)

An Encrypted Data Vault is capable of storing many different types of data, including large unstructured binary data. This means that storing a file as a single entry would be challenging for systems that have limits on single record sizes. For example, some databases set the maximum size for a single record to 16MB. As a result, it is necessary that large data is chunked into sizes that are easily managed by a server. It is the responsibility of the client to set the chunk size of each resource and chunk large data into manageable chunks for the server. It is the responsibility of the server to deny requests to store chunks larger than 1MiB, which is the maximum size for a single chunk.

Each chunk is encrypted individually using authenticated encryption. Doing so protects against attacks where an attacking server replaces chunks in a large file and requires the entire file to be downloaded and decrypted by the victim before determining that the file is compromised. Encrypting each chunk with authenticated encryption ensures that a client knows that it has a valid chunk before proceeding to the next one. Note that another authorized client can still perform an attack by doing authenticated encryption on a chunk, but a server is not capable of launching the same attack.

Client: Resource structure (L1)

The process of storing encrypted data starts with the creation of a Resource by the client, with the following structure.

Resource:

id (required)
meta
- meta.contentType MIME type
content - entire payload, or a manifest-like list of hashlinks to individual chunks

If the data is less than the chunk size, it is embedded directly into the content.

Otherwise, the data is sharded into chunks by the client (see next section), and each chunk is encrypted and sent to the server. In this case, content contains a manifest-like listing of URIs to individual chunks (integrity-protected by [[HASHLINK]].

Client: Encrypted resource structure (L1)

The process of creating the Encrypted Resource. If the data was sharded into chunks, this is done after the individual chunks are written to the server.

id
index - encrypted index tags prepared by the client (for use with privacy-preserving querying over encrypted resources)
Chunk size (if different from the default in global config)
Versioning metadata - such as sequence numbers, Git-like hashes, or other mechanisms
Encrypted resource payload - encoded as a jwe [[RFC7516]]

Layer 2 (L2) responsibilities

Layer 2 consists of a system that is capable of sharing data among multiple entities, of versioning and replication, and of performing privacy-preserving searches in an efficient manner.

Client: Encrypted search indexes (L2)

To enable privacy-preserving querying (where the search index is opaque to the server), the client must prepare a list of encrypted index tags (which are stored in the Encrypted Resource, alongside the encrypted data contents).

Need details about salting and encryption mechanism of index tags.

Client: Versioning and replication (L2)

A server must support at least one versioning/change control mechanism. Replication is done by the client, not by the server (since the client controls the keys, knows about which other servers to replicate to, etc.). If an Encrypted Data Vault implementation aims to provide replication functionality, it MUST also pick a versioning/change control strategy (since replication necessarily involves conflict resolution). Some versioning strategies are implicit ("last write wins", eg. rsync or uploading a file to a file hosting service), but keep in mind that a replication strategy always implies that some sort of conflict resolution mechanism should be involved.

Client: Sharing with other entities

An individual vault's choice of authorization mechanism determines how a client shares resources with other entities (authorization capability link or similar mechanism).

Layer 3 (L3) responsibilities

Server: Notifications (L3)

It is helpful if data storage providers are able to notify clients when changes to persisted data occurs. A server may optionally implement a mechanism by which clients can subscribe to changes in the vault.

Client: Vault-wide integrity protection (L3)

Vault-wide integrity protection is provided to prevent a variety of storage provider attacks where data is modified in a way that is undetectable, such as if documents are reverted to older versions or deleted. This protection requires that a global catalog of all the resource identifiers that belong to a user, along with the most recent version, is stored and kept up to date by the client. Some clients may store a copy of this catalog locally (and include integrity protection mechanism such as [[HASHLINK]] to guard against interference or deletion by the server.

Property	Description
sequence	A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number.
controller	The entity or cryptographic key that is in control of the data vault. The value is required and MUST be a URI.
invoker	The root entities or cryptographic key(s) that are authorized to invoke an authorization capability to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose.
delegator	The root entities or cryptographic key(s) that are authorized to delegate authorization capabilities to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose.
referenceId	Used to express an application-specific reference identifier. The value is optional and, if present, MUST be a string.
keyAgreementKey.id	An identifier for the key agreement key. The value is required and MUST be a URI. The key agreement key is used to derive a secret that is then used to generate a key encryption key for the receiver.
keyAgreementKey.type	The type of key agreement key. The value is required and MUST be or map to a URI.
hmac.id	An identifier for the HMAC key. The value is required a MUST be or map to a URI.
hmac.type	The type of HMAC key. The value is required and MUST be or map to a URI.

Property	Description
id	An identifier for the structured document. The value is required and MUST be a Base58-encoded 128-bit random value.
meta	Key-value metadata associated with the structured document.
content	Key-value content for the structured document.

Property	Description
meta.chunks	Specifies the number of chunks in the stream.
stream.id	The identifier for the stream. The stream identifier MUST be a URI that references a stream on the same data vault. Once the stream has been written to the data vault, the content identifier MUST be updated such that it is a valid hashlink. To allow for streaming encryption, the value of the digest for the stream is assumed to be unknowable until after the stream has been written. The hashlink MUST exist as a content hash for the stream that has been written to the data vault.

Property	Description
id	An identifier for the encrypted document. The value is required and MUST be a Base58-encoded 128-bit random value.
sequence	A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number.
jwe	A JSON Web Encryption object that, if decoded, results in the corresponding StructuredDocument.

HTTP API

This section introduces the HTTPS API for interacting with data vaults and their contents.

Creating an Encrypted Data Vault

A data vault is created by performing an HTTP POST of a DataVaultConfiguration to the dataVaultCreationService. The following HTTP status codes are defined for this service:

HTTP Status	Description
201	data vault creation was successful. The HTTP `Location` header will contain the URL for the newly created data vault.
400	data vault creation failed.
409	A duplicate data vault exists.

An example exchange of a data vault creation is shown below:

  POST /edvs HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "sequence": 0,
    "controller": "did:example:123456789",
    "referenceId": "urn:uuid:abc5a436-21f9-4b4c-857d-1f5569b2600d",
    "keyAgreementKey": {
      "id": "https://example.com/kms/12345",
      "type": "X25519KeyAgreementKey2019"
    },
    "hmac": {
      "id": "https://example.com/kms/67891",
      "type": "Sha256HmacKey2019"
    }
  }

Explain the purpose of the controller property to root authority. Explain how Authorization Capabilities can be created and invoked via HTTP signatures to authorize reading and writing from/to data vaults.

If the creation of the data vault was successful, an HTTP 201 status code is expected in return:

  HTTP/1.1 201 Created
  Location: https://example.com/edvs/z4sRgBJJLnYy
  Cache-Control: no-cache, no-store, must-revalidate
  Pragma: no-cache
  Expires: 0
  Date: Fri, 14 Jun 2019 18:35:33 GMT
  Connection: keep-alive
  Transfer-Encoding: chunked

Creating a Document

A structured document is stored in a data vault by encoding a StructuredDocument as an EncryptedDocument and then performing an HTTP POST to a data vault endpoint created via . The following HTTP status codes are defined for this service:

HTTP Status	Description
201	Structured document creation was successful. The HTTP `Location` header will contain the URL for the newly created document.
400	Structured document creation failed.

In order to convert a StructuredDocument to an EncryptedDocument an implementer MUST encode the StructuredDocument as a JWE Encrypted object. Once the document is encrypted, it can be sent to the document creation service.

A protocol example of a document creation is shown below:

  POST /edvs/z4sRgBJJLnYy/docs HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 0,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

If the creation of the structured document was successful, an HTTP 201 status code is expected in return:

  HTTP/1.1 201 Created
  Location: https://example.com/edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz
  Cache-Control: no-cache, no-store, must-revalidate
  Pragma: no-cache
  Expires: 0
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Connection: keep-alive
  Transfer-Encoding: chunked

Reading a Document

Reading a document from a data vault is performed by retrieving the EncryptedDocument and then decrypting it to a StructuredDocument. The following HTTP status codes are defined for this service:

HTTP Status	Description
200	EncryptedDocument retrieval was successful.
400	EncryptedDocument retrieval failed.
404	EncryptedDocument with given id was not found.

In order to convert an EncryptedDocument to a StructuredDocument an implementer MUST decode the EncryptedDocument from a JWE Encrypted object. Once the document is decrypted, it can be processed by the web application.

A protocol example of a document retrieval is shown below:

Explain that the URL path structure is fixed for all data vaults to enable portability and the use of stable URLs (such as through DID URLs) to reference certain documents while allowing users to change their data vault service providers. Explain how this enables portability.

  GET https://example.com/edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
  Host: example.com
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

If the retrieval of the encrypted document was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Connection: keep-alive

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 0,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

Updating a Document

A structured document is updated in a data vault by encoding the updated StructuredDocument as an EncryptedDocument and then performing an HTTP POST to a data vault endpoint created via . The following HTTP status codes are defined for this service:

HTTP Status	Description
200	Structured document update was successful.
400	Structured document update failed.

A protocol example of a document update is shown below:

  POST  /edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 1,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

If the update to the encrypted document was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:39:52 GMT
  Connection: keep-alive

Deleting a Document

A structured document is deleted by performing an HTTP DELETE to a data vault endpoint created via . The following HTTP status codes are defined for this service:

HTTP Status	Description
200	Structured document was deleted successfully.
400	Structured document deletion failed.
404	Structured document was not found.

A protocol example of a document deletion is shown below:

  DELETE  /edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
  Host: example.com

If the deletion of the encrypted document was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:40:18 GMT
  Connection: keep-alive

Batch Operations

There is an ongoing debate in the Secure Data Storage Working Group regarding whether or not a batch API is necessary to achieve acceptable levels of performance. The Secure Data Storage Working Group is seeking implementer feedback regarding this feature, or alternatively, the use of parallel requests to achieve similar performance characteristics. Additionally, the operation syntax is not yet finalized. See also Issue #4.

Documents within a vault can be created, updated or deleted in a single REST call. The following HTTP status codes are defined for this service:

HTTP Status	Description
200	All operations successful.
400	One or more operations were unsuccessful.

A protocol example of a batch request is shown below. The first element in the JSON array represents a create document operation. The second element represents an update document operation. The third element represents a delete document operation.

  POST  /edvs/z4sRgBJJLnYy/batch HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  [{
    "id": "LHMjVyevZgZh7BiCjCiu7P",
    "sequence": 0,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  },
  {
    "id": "4L3jCAmMiQQeWWUv3ADoBi",
    "sequence": 1,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  },
  {
    "delete": "NyNHSQ449YcuF4JuxpvM2y"
  }]

If all operations are successful, an HTTP 200 status code is expected in return. The response body will be an array of responses - one for each operation.

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:40:18 GMT
  Connection: keep-alive
  [{
    "status": 201,
    "message": "created",
    "location": "https://example.com/edvs/z4sRgBJJLnYy/docs/LHMjVyevZgZh7BiCjCiu7P"
   },
   {
    "status": 200,
    "message": "updated"
   },
   {
    "status": 200,
    "message": "deleted"
    }]

If one or more operations were unsuccessful, an HTTP 400 status code is expected in return. The response body will be an array of responses - one for each operation.

  HTTP/1.1 400 Bad Request
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:40:18 GMT
  Connection: keep-alive
  [{
    "status": 201,
    "message": "created",
    "location": "https://example.com/edvs/z4sRgBJJLnYy/docs/LHMjVyevZgZh7BiCjCiu7P"
   },
   {
    "status": 200,
    "message": "updated"
   },
   {
    "status": 404,
    "message": "document not found"
    }]

Creating a Stream

This section is out of date, do not implement.

Another design is being considered that would transform streams into a single index document and a collection of documents, each of which contains a chunk of the stream. This would be done to help prevent misuse of a decryption stream prior to its authentication. In order for this approach to be implemented in a Web browser, it also requires certain File or Blob APIs. Further investigation is needed to ensure that support of these APIs would be sufficient for this design approach, as it would be preferred to prevent data misuse and to make better use of native implementations of authenticated encryption modes.

A stream is stored in a data vault by writing a document containing metadata about the stream, encrypting the stream, writing it to a data vault, and then updating the document containing metadata about the stream. A chunk's size (as measured by the size of the HTTP body containing the chunk) MUST NOT exceed 1MiB. The following HTTP status codes are defined for this service:

HTTP Status	Description
201	Stream creation was successful. The HTTP `Location` header will contain the URL for the newly created stream.
400	Stream creation failed.

Implementations first encode the metadata associated with the stream into a StructuredDocument:

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "meta": {
      "created": "2019-06-18",
      "contentType": "video/mpeg",
      "contentLength": 56735817
    },
    "content": {
      "id": "https://example.com/edvs/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz"
    }
  }

In this case, the value of content.id is a reference to the stream located at https://example.com/edvs/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz, which is the location that the stream MUST be written to. This content identifier MUST be updated to include a hashlink once the stream has been written and its digest is known.

The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:

  POST /edvs/z4sRgBJJLnYy/docs HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 0,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

If the creation of the structured document was successful, an HTTP 201 status code is expected in return:

  HTTP/1.1 201 Created
  Location: https://example.com/edvs/z4sRgBJJLnYy/docs/zp4H8ekWn
  Cache-Control: no-cache, no-store, must-revalidate
  Pragma: no-cache
  Expires: 0
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Connection: keep-alive
  Transfer-Encoding: chunked

Next, in order to convert a stream to an EncryptedStream an implementer MUST encrypt the stream. Once the stream is encrypted (or as it is encrypted), it can be sent to the stream creation service.

A protocol example of a stream creation is shown below:

  POST /edvs/z4sRgBJJLnYy/streams HTTP/1.1
  Host: example.com
  Content-Type: application/octet-stream
  Transfer-Encoding: chunked
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  TBD

If the creation of the stream was successful, an HTTP 201 status code is expected in return:

  HTTP/1.1 201 Created
  Location: https://example.com/edvs/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz
  Cache-Control: no-cache, no-store, must-revalidate
  Pragma: no-cache
  Expires: 0
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Connection: keep-alive
  Transfer-Encoding: chunked

Once a stream is created, the metadata related to the stream can be updated in the data vault using the protocol defined in . An example of updating a link to a video file is shown below.

Implementations update the metadata associated with the stream in its StructuredDocument:

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 1,
    "meta": {
      "created": "2019-06-18",
      "contentType": "video/mpeg",
      "contentLength": 56735817
    },
    "content": {
      "id": "https://example.com/edvs/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz?hl=zb47JhaKJ3hJ5Jkw8oan35jK23289Hp",
      "jwe": {
        "protected": "eyJlbmMiOiJDMjBQIn0",
        "recipients": [{
          "header": {
            "alg": "A256KW",
            "kid": "https://example.com/kms/zSDn2MzzbxmX"
          },
          "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
        }],
        "iv": "i8Nins2vTI3PlrYW",
        "tag": "pfZO0JulJcrc3trOZy8rjA"
      }
    }
  }

The value of content.id MUST be updated to include a hashlink now that the stream has been written and its digest is known.

The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:

  POST /edvs/z4sRgBJJLnYy/docs HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76044",
    "sequence": 1,
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

If the creation of the structured document was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Location: https://example.com/edvs/z4sRgBJJLnYy/docs/zp4H8ekWn
  Cache-Control: no-cache, no-store, must-revalidate
  Pragma: no-cache
  Expires: 0
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Connection: keep-alive
  Transfer-Encoding: chunked

Reading a Stream

This section is out of date, do not implement.

Reading a stream from a data vault is performed by retrieving the associated metadata that is encrypted as an EncryptedDocument, decoding the hashlink information, and then retrieving the EncryptedStream and then decrypting it. The following HTTP status codes are defined for this service:

HTTP Status	Description
200	Encrypted stream retrieval was successful.
400	Encrypted stream retrieval failed.
404	Encrypted stream with given id was not found.

In order to convert an EncryptedStream to a stream an implementer MUST decode the EncryptedStream using the information provided in the associated EncryptedDocument. Once the stream is decrypted, it can be processed by the web application.

Implementers can perform random seeking in the stream by using the Content-Range HTTP Header.

A protocol example of a stream retrieval is shown below:

  GET https://example.com/edvs/z4sRgBJJLnYy/streams/zn2XmSDzMbxz HTTP/1.1
  Host: example.com
  Content-Range: 0-1048576
  Accept: application/octet-stream
  Accept-Encoding: gzip, deflate

If the retrieval of the encrypted stream was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Date: Fri, 14 Jun 2019 18:37:12 GMT
  Content-Range: 0-1048576
  Content-Length: 1048576
  Connection: keep-alive

  ...

Deleting a Stream

This section is out of date, do not implement.

A stream is deleted by performing an HTTP DELETE to a data vault stream endpoint created via and the corresponding metadata document created via . The following HTTP status codes are defined for this service:

HTTP Status	Description
200	Stream was deleted successfully.
400	Stream deletion failed.
404	Stream was not found.

A protocol example of a stream deletion is shown below:

  DELETE  /edvs/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz HTTP/1.1
  Host: example.com

If the deletion of the encrypted stream was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:40:18 GMT
  Connection: keep-alive

Once the stream is deleted, implementations MUST also delete the corresponding metadata document:

  DELETE  /edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
  Host: example.com

If the deletion of the encrypted stream was successful, an HTTP 200 status code is expected in return:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:40:18 GMT
  Connection: keep-alive

Requesting History of Changes

To facilitate use-cases such as efficiently replicating changes between different EDV instances, it is advantageous for a client to be able to request the history of changes from an EDV server. This section details the definition of an API that allows an EDV client to request from an EDV server a list of changes that have occurred, organized by the documents the changes pertain to.

A request for the history of changes to an EDV server is performed by submitting an HTTP GET request to the EDV servers /history endpoint.

The following HTTP status codes are valid for this endpoint

HTTP Status	Description
200	Query for history was successful.

An example request is shown below:

  GET https://example.com/edvs/z4sRgBJJLnYy/history HTTP/1.1
  Host: example.com
  Accept: application/json
  Accept-Encoding: gzip, deflate

If the retrieval of the history was successful, a response with an HTTP 200 status code is expected in return

An example response is shown below:

  [
    {
        "documentId": "https://example/edvs/z4sRgBJJLnYy/docs/z4sAsDFrGYh",
        "sequence": 1
    },
    {
        "documentId": "https://example/edvs/z4sRgBJJLnYy/docs/zaCfHeKsNesS",
        "sequence": 2
    }
  ]

Filtering History

In certain cases an EDV Client may want to filter the history returned by an EDV server, to facilitate this the following Query Parameters are defined which can be added to a request to an EDV servers history API

Query Parameter	Data Type	Description
afterSequence	Unsigned Integer	Query for changes to documents that have occurred after the supplied sequence value.

An example request filtering returned changes based on the afterSequence query parameter:

  GET https://example.com/edvs/z4sRgBJJLnYy/history?afterSequence=4 HTTP/1.1
  Host: example.com
  Content-Range: 0-1048576
  Accept: application/json
  Accept-Encoding: gzip, deflate

An example response is shown below:

  [
    {
        "documentId": "https://example/edvs/z4sRgBJJLnYy/docs/zyUtJ67Jm",
        "sequence": 5
    },
    {
        "documentId": "https://example/edvs/z4sRgBJJLnYy/docs/zaCHeJtUcNu",
        "sequence": 6
    },
    {
        "documentId": "https://example/edvs/z4sRgBJJLnYy/docs/zaasdChfrJsn",
        "sequence": 7
    }
  ]

Creating Encrypted Indexes

It is often useful to search a data vault for structured documents that contain specific metadata. Efficient searching requires the use of search indexes and local access to data. This poses an interesting challenge as the search has to be performed on the storage provider without leaking information that could violate the privacy of the entities that are storing information in the data vault. This section details how encrypted indexes can be created and used to perform efficient searching while protecting the privacy of entities that are storing information in the data vault.

When creating an EncryptedDocument, blinded index properties MAY be used to perform efficient searches. An example of the use of these properties is shown below:

  {
    "id": "urn:uuid:698f3fb6-592f-4d22-9e04-462cc4606a23",
    "sequence": 0,
    "indexed": [{
      "sequence": 0,
      "hmac": {
        "id": "https://example.com/kms/z7BgF536GaR",
        "type": "Sha256HmacKey2019"
      },
      "attributes": [{
        "name": "CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
        "value": "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro",
        "unique": true
      }, {
        "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
        "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
      }]
    }],
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [
        {
          "header": {
            "alg": "A256KW",
            "kid": "https://example.com/kms/z7BgF536GaR"
          },
          "encrypted_key":
            "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
        }
      ],
      "iv": "i8Nins2vTI3PlrYW",
      "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }

The example above demonstrates the use of unique index values as well as non-unique indexes.

The example above enables the storage provider to build efficient indexes on encrypted properties while enabling storage agents to search the information without leaking information that would create privacy concerns.

Document the following in this section:

The `equals` filter is an object with key-value attribute pairs. Any document that matches *all* given key-value attribute pairs will be returned. If equals is an array, it may contain multiple such filters -- whereby the results will be all documents that matched any one of the filters. If the document's value for a matching a key is an array and the array contains a matching value, the document will be considered a match (provided that other key-value attribute pairs also match).

Here are some examples:

  // for the query:
  {equals: {"content.foo": "bar"}}
  // this will match documents that look like this:
  {"content": {"foo": "bar"}}
  {"content": {"foo": ["bar"]}}
  {"content": {"foo": ["bar", "other"]}}

  // for the query:
  {equals: [{"content.foo": "bar"}, {"content.foo": "baz"}]}
  // this will match documents that look like this:
  {"content": {"foo": "bar"}}
  {"content": {"foo": ["bar"]}}
  {"content": {"foo": ["bar", "other"]}}
  {"content": {"foo": "baz"}}
  {"content": {"foo": ["baz"]}}
  {"content": {"foo": ["baz", "other"]}}

  // for the query:
  {equals: {"content.foo": ["bar", "baz"]}}
  // this will match documents that look like this:
  {"content": {"foo": ["bar", "baz"]}}
  {"content": {"foo": [["bar", "baz"]]}}
  {"content": {"foo": [["bar", "baz"], "other"]}}

  // for the query:
  {equals: {"content.https://schema\\.org/": "bar"}}
  // this will match documents that look like this:
  {"content": {"https://schema.org": "bar"}}
  {"content": {"https://schema.org": ["bar"]}}
  {"content": {"https://schema.org": ["bar", "other"]}}

  // for the query:
  {equals: {"content.foo": {"a": 4, "b": 5}}}
  // this will match documents that look like this:
  {"content": {"foo": {"a": 4, "b": 5}}}
  {"content": {"foo": [{"a": 4, "b": 5}]}}
  {"content": {"foo": [{"a": 4, "b": 5}, "other"]}}
  {"content": {"foo": {"b": 5, "a": 4}}} // note key order does not matter
  {"content": {"foo": [{"b": 5, "a": 4}]}}
  {"content": {"foo": [{"b": 5, "a": 4}, "other"]}}

The HMAC blinding process is very close to what @OR13 described above. There are two minor differences that are important:

Before a value is HMAC'd, it is namespaced to its key to prevent leaking information about same values across different keys. This is done by doing `HMAC({key: value})` instead of just `HMAC(value)`.
The input to HMAC for values is run through the JSON canonicalization algorithm, [JCS - RFC8785](https://tools.ietf.org/html/rfc8785) to ensure that property insertion order in the value will not matter. This matters when the value is not a simple primitive such as a string, but instead it is an object such as `{a: 4, b: 5}`.

By way of example, for `equals: [{"content.foo": "bar"}]`, the process is:

  1. Set `blinded` to an empty array `[]`.
  2. For every element (`{"content.foo": "bar"}`) in the `equals` array:
  2.1. For every key (`"content.foo"`) and its value (`"bar"`) in the object:
  2.1.1. Set `value` to an object with `key` and its value (`{"content.foo": "bar"}`).
  2.1.2. Canonicalize `value` using [JCS](https://tools.ietf.org/html/rfc8785).
  2.1.3. Append the object `{[HMAC(key): HMAC(value)]}` to `blinded`.
  3. Return `{equals: blinded}`.

Note that the HMAC output is base64url-encoded so it can be treated as string. Also note that indexes may be marked as "unique", enabling storage servers to reject documents that include certain duplicate attribute values. Additionally, an index can be "compound" and unique, allowing storage servers to reject documents that include certain duplicate attribute values within some other group. For example, you can create a compound, unique index on `["content.type", "content.name"]`. This would ensure that only one document with the same `"content.type"` and `"content.name"` could be inserted into storage. But many documents with the same `"content.type"` can be inserted, provided that they do not have the same `"content.name"` for that `"content.type"`. This is a very useful feature for storing different collections of items in a single EDV.

Compound index values are computed by HMACing together every blinded value for each attribute. For example, for this unique, compound index: `["content.type", "content.country", "content.region"]`, a document like this:

  {content: {type: "Location", country: "AU", region: "NSW"}}

Would be indexed by first blinding `{"content.type": "Location"}`, `{"content.country": "AU"}`, and `{"content.region": "NSW"}` just like above. Then index entries would be created for the blinded entry for "content.type", the combination of the blinded entries for "content.type" and "content.country", and finally, the combination of the blinded entries for "content.type", "content.country", and "country.region". The combinations are built by HMACing the concatenated the blinded attribute names using a colon (`:`) and HMACing the concatenated blinded attribute values using a colon (`:`). Note that a colon (`:`) was selected because it is not a character in the base64url alphabet. In pseudo code, blinded compound indexes entries look like:

  key = HMAC(blinded1.name):HMAC(blinded2.name):...
  value = HMAC(blinded1.value):HMAC(blinded2.value):...
  Return {key, value}.

For clarity, the above would be repeated twice for type, country, region example -- both for the type+country combination and the type+country+region combination. The first time it would use blinded entries 1 (type) and 2 (country), and the second time it would use 1 (type), 2 (country), and 3 (region).

This same process is repeated when building a query that targets a compound index. The server sees no difference between a compound index and a regular index, but it does have to be made aware of whether or not an index is unique.

Index entries are stored along with a document in an index field that is identified by an identifier for the HMAC used. A document can have N many such indexes, each using different HMAC keys (and access to those keys may differ).

Provide instructions and examples for how indexes are blinded using an HMAC key.

Explain that multiple entities can maintain their own independent indexes (using their own HMAC key) provided they have been granted this capability. Explain that indexes can be sparse/partial. Explain that indexes have their own sequence number and that it will match the document's sequence number once it is updated.

Add a section showing the update index endpoint and how it works.

Searching Encrypted Documents

The working group is in the process of deciding on pagination syntax for queries. Be careful when using queries right now, since there is currently no limit to how many documents can be returned in a single call. See also Issue #14.

The contents of a data vault can be searched using encrypted indexes created using the processes described in . There are two primary ways of searching for encrypted documents. The first is to search for a specific value associated with a specific index. The second is to search to see if a specific index exists on a document.

When doing a search for a specific value associated with a specific index, the syntax for the equals filter is as follows: equals is an array of one or more subfilters. Each subfilter consists of one or more key-value attribute pairs. For a document to be matched, it MUST match at least one of those subfilters. For a document to match a subfilter, it MUST contain all the given attribute pairs within that subfilter. In other words, key-value attribute pairs within a subfilter indicate an AND operation with those pairs, with the final result being an OR operation between the subfilters. If there is only one subfilter needed, then equals MAY be directly set to it instead of being an array of one element.

The examples below demonstrate how to query for documents within a vault using the syntax described above.

  POST https://example.com/edvs/z4sRgBJJLnYy/query HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "index": "https://example.com/kms/z7BgF536GaR",
    "equals":
      {"DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ":
        "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
      },
    "returnFullDocuments": false
  }

The above example would match any document containing the following attribute:

  {
    "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
    "value": "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
  }

  POST https://example.com/edvs/z4sRgBJJLnYy/query HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "index": "https://example.com/kms/z7BgF536GaR",
    "equals":
      {"DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ":
        "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro",
       "AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==":
        "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="}
    ,
    "returnFullDocuments": false
  }

The above example would match all documents containing BOTH of the following attributes:

  {
    "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
    "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
  }

  {
    "name": "AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==",
    "value": "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="
  }

  POST https://example.com/edvs/z4sRgBJJLnYy/query HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "index": "https://example.com/kms/z7BgF536GaR",
    "equals": [
      {"DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ":
        "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"},
       {"AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==":
        "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="}
    ],
    "returnFullDocuments": false
  }

The above example would match any document containing EITHER of the following attributes:

  {
    "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
    "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
  }

  {
    "name": "AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==",
    "value": "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="
  }

  POST https://example.com/edvs/z4sRgBJJLnYy/query HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "index": "https://example.com/kms/z7BgF536GaR",
    "equals": [
      {"DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ":
        "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro",
       "PatVwcVAKLjrp86oXvHFnAJ3ss1EdJY7gp3gBoaRd5g-zIFE6g==":
         "WatVwcXiIlQUPT2OiTsPRGWvOyJAXctYo2T97fHvkl0_03g9Jw=="},
       {"AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==":
        "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="}
    ],
    "returnFullDocuments": false
  }

The above example would match any document satisfying EITHER one of the following two conditions:

Condition 1. It contains BOTH of the following attributes:

  {
    "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
    "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
  }

  {
    "name": "PatVwcVAKLjrp86oXvHFnAJ3ss1EdJY7gp3gBoaRd5g",
    "value": "WatVwcXiIlQUPT2OiTsPRGWvOyJAXctYo2T97fHvkl0_03g9Jw=="
  }

Condition 2. It contains the following attribute:

  {
    "name": "AarngVIZLl0kIp2xEHUH5o5uVc-470roQaOIbqMUD7DFQQypWQ==",
    "value": "AYubg9VnEitQBxlhjVFnYRlfQ5UHWe3ia4aMiQ6srhcrXtEK2Q=="
  }

If returnFullDocuments was set to false, a successful query will result in a standard HTTP 200 response with a list of identifiers for all encrypted documents that match the query:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:45:18 GMT
  Connection: keep-alive

  ["https://example.com/edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz"]

If returnFullDocuments was set to true, a successful query will result in a standard HTTP 200 response with a list of all encrypted documents that match the query:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:45:18 GMT
  Connection: keep-alive

  [
    {
      "id": "zMbxmSDn2Xzz",
      "sequence": 0,
      "indexed": [{
        "sequence": 0,
        "hmac": {
          "id": "https://example.com/kms/z7BgF536GaR",
          "type": "Sha256HmacKey2019"
        },
        "attributes": [{
          "name": "CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
          "value": "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro",
          "unique": true
        }, {
          "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
          "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
        }]
      }],
      "jwe": {
        "protected": "eyJlbmMiOiJDMjBQIn0",
        "recipients": [
          {
            "header": {
              "alg": "A256KW",
              "kid": "https://example.com/kms/z7BgF536GaR"
            },
            "encrypted_key":
              "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
          }
        ],
        "iv": "i8Nins2vTI3PlrYW",
        "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
        "tag": "pfZO0JulJcrc3trOZy8rjA"
      }
    }
  ]

The contents of a data vault can also be searched to see if a certain attribute name is indexed by using the has keyword.

  POST https://example.com/edvs/z4sRgBJJLnYy/query HTTP/1.1
  Host: example.com
  Content-Type: application/json
  Accept: application/json, text/plain, */*
  Accept-Encoding: gzip, deflate

  {
    "has": ["CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ"],
    "returnFullDocuments": false
  }

If returnFullDocuments was set to false, a successful query will result in a standard HTTP 200 response with a list of EncryptedDocument identifiers that contain the value:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:45:18 GMT
  Connection: keep-alive

  ["https://example.com/edvs/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz"]

If returnFullDocuments was set to true, a successful query will result in a standard HTTP 200 response with a list of EncryptedDocuments that contain the value:

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, must-revalidate
  Date: Fri, 14 Jun 2019 18:45:18 GMT
  Connection: keep-alive

  [
    {
      "id": "zMbxmSDn2Xzz",
      "sequence": 0,
      "indexed": [{
        "sequence": 0,
        "hmac": {
          "id": "https://example.com/kms/z7BgF536GaR",
          "type": "Sha256HmacKey2019"
        },
        "attributes": [{
          "name": "CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
          "value": "RV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro",
          "unique": true
        }, {
          "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
          "value": "QV58Va4904K-18_L5g_vfARXRWEB00knFSGPpukUBro"
        }]
      }],
      "jwe": {
        "protected": "eyJlbmMiOiJDMjBQIn0",
        "recipients": [
          {
            "header": {
              "alg": "A256KW",
              "kid": "https://example.com/kms/z7BgF536GaR"
            },
            "encrypted_key":
              "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
          }
        ],
        "iv": "i8Nins2vTI3PlrYW",
        "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
        "tag": "pfZO0JulJcrc3trOZy8rjA"
      }
    }
  ]

For streams, only their document IDs will be returned, regardless of whether returnFullDocuments is set to true or not.

Introduction

Why Do We Need Confidential Storage?

Ecosystem Overview

Requirements

Privacy and multi-party encryption

Sharing and authorization

Identifiers

Versioning and replication

Metadata and searching

Protocols

Design goals

Layered and modular architecture

Prioritize privacy

Push implementation complexity to the client

Terminology

Core Concepts

Encrypted Storage

Encrypted Resources

Structured Document Resources

Stream Resources

Indexing and Querying

Operations

Create Operation

Read Operation

Update Operation

Delete Operation

Query Operation

Authorization Structure

Authorization Invocation

Architecture

Server and client responsibilities

Layer 1 (L1) responsibilities

Server: validate requests (L1)

Server: Persist data (L1)

Server: Persist global configuration (L1)

Server: enforcement of authorization policies (L1)

Client: encrypted data chunking (L1)

Client: Resource structure (L1)

Client: Encrypted resource structure (L1)

Layer 2 (L2) responsibilities

Client: Encrypted search indexes (L2)

Client: Versioning and replication (L2)

Client: Sharing with other entities

Layer 3 (L3) responsibilities

Server: Notifications (L3)

Client: Vault-wide integrity protection (L3)

Data Model

DataVaultConfiguration

StructuredDocument

Streams

EncryptedDocument

HTTP API

Creating an Encrypted Data Vault

Creating a Document

Reading a Document

Updating a Document

Deleting a Document

Batch Operations

Creating a Stream

Reading a Stream

Deleting a Stream

Requesting History of Changes

Filtering History

Creating Encrypted Indexes

Searching Encrypted Documents

Extension Points

Privacy Considerations

Security Considerations

Malicious or accidental modification of data

Compromised vault

Data access timing attacks

Encrypted data on public networks

Unencrypted data on server

Partial matching on encrypted indexes

Threat model for malicious service provider

Accessibility Considerations

Internationalization Considerations

Acknowledgements