Your specification abstract goes here.
This is an experimental specification and is undergoing regular revisions. It is not fit for production deployment.
The purpose of this document is to collect use cases, deployment scenarios, requirements, and design goals in a document in order to guide the work of the DIF SDS WG. At present, we are collecting information. After the information gathering phase, we will go through a ranked choice vote in order to prioritize each use case, scenario, requirement, and design goals.
The following four use cases have been identified as representative of common usage patterns (though are by no means the only ones).
I want to store my information in a safe location. I don't want the storage provider to be able to see any data I store. This means that only I can see and use the data. I don’t even want the storage provider to be able to see metadata like file names, file sizes, directory topology, or who I’ve granted access to something.
Over time, I will store a large amount of data. I want to search the data, but don't want the service provider to know what I'm storing or searching for.
I want to share my data with other people and services. I can decide on giving other entities access to data in my storage area when I save the data for the first time or in a later stage. The storage should only give access to others when I have explicitly given consent for each item.
I want to be able to revoke the access of others at any time. When sharing data, I can include an expiration date for the access to my data by a third-party.
I want to back up my data across multiple storage locations in case one fails. These locations can be hosted by different storage providers and can be accessible over different protocols. One location could be local on my phone, while another might be cloud-based. The locations should be able to masterless synchronize between each other so data is up to date in both places regardless of how I create or update data, and this should happen automatically and without my help as much as possible.
Alice sets up replication to a backup instance so that she does not lose her data in case her primary instance is damaged/lost etc.
Alice has multiple instances, and would like to synchronize their contents among all of them. When her local instance is offline, Alice wants to still be able to make updates to it. When that instance reconnects, it is expected to hand off all of the changes that occurred while it was disconnected (and receive changes from the other instances). This is useful for many different situations:
Notes
An organization has two geographically distributed instances. Updates to one instance are expected to propagate as soon as possible to the other instance. If one of those goes down, read and write traffic can be redirected to the remaining one, with minimal interruption in data availability.
Alice wants to be able to store data that is plaintext anonymous visible, but authenticated. Alice does not want anyone to prevent Alice from publishing public plaintext data.
Alice wants to be able to store plaintext and plaintext metadata on servers that she does not operate.
As a user, I would like a mechanism where other entities can send me actionable message objects that are synced across my SDS instances, so that I can act on them in whatever way is appropriate for the type of message. If subsequent messages are exchanged as a result of an initiating message, the subsequent messages should be referentially tied together (e.g. thread back to the parent).
Event Types, Event Subscriptions, Event Handlers, data and program should be storable in SDS. Similar to systems which watch file changes and transpile, or forward messages... "IF this THEN that"... could be set of predefined workflows, or executable code. Examples of similar systems: Github Actions, Zapier, IFTT, OpenFaaS.
Ensure users have the ability to terminate a relationship with one or more service providers in a manner that does not disclose anything about the future state or location of the user’s data. Ensure the user can verify or obtain a verifiable certification from service provider that their data has been, if requested, purged from the provider's system (confirming right to be forgotten).
Ensure users have the ability to shard their data, or create M of N key scenarios across one or multiple providers. Provide options to the user which allow for either the service provider to manage sharding or allow sharding that is completely hidden from a service provider.
Virtual Asset Custody by analogy to safe deposit box in the physical world. Intentionally vague about mechanisms of delegation, multisig, authentication, and authorization-- should work across multiple version of all those.
Each jurisdiction is different, but Alice is American and Bob works for an American bank so we can assume:
Bank customer Alice has some valuable digital assets (already-encrypted files containing blueprints for a missile) that she wants securely put into bank custody. She needs to know that the bank cannot access them in clear text. Bank manager Bob will be helping her set up her digital "safe deposit box". Alice needs to give access to her wife Carroll. Alice authenticates herself (with KYC) and signs an agreement for her custodial service, which includes assuming costs for key rotations/replacements. Alice also delegates access to Carroll, who is also authenticated and KYCed before receiving access as well.
Once a month, Carroll checks out the contents out for an hour, decrypts them, sometimes adds or subtracts content before re-encrypting and filing them back in storage. Each time, she has to be authenticated. These events are timestamped and logged but otherwise privacy rights are preserved. There may be other restrictions on access in the contract, by analogy to real-world safe deposit boxes [1], or arbitrary frictions or delay having to do with relevant banking regulations [2].
Due to a severe fire, every electronic copy of Alice's key is lost. As per the contract, her access (and Carroll's) can be restored for a fee.
Carroll's key needs to be revoked without affecting any other aspects or assumptions. Carroll opens a separate account and places her own assets in it.
Special Agent Dave has been investigating Alice's work at the missile plant, and decides her possession of those blueprints is a criminal act of industrial espionage. He gets a court order to seize the assets, "drilling into" the virtual safe deposit box and extract the contents. After this point, Alice cannot get access to her files, although Dave's court order to get the necessary key material to decrypt them is out of scope.
Without her ex's financial support, Carroll cannot stay on top of her banking bills. As per her contract, after a grace period, the contents are seized and become property of the bank, auctioned off to pay outstanding dues, encrypted or not.
The following company names and scenarios are hypothetical.
In order to benefit from NAFTA (North American Free Trade Agreement) per revised rules, auto manufacturers will now be required to demonstrate that seventy percent of their steel and aluminum purchases originated in North America. They need to make this information available to regulators and customers, without providing unnecessary visibility to competitors. US-based steel manufacturer Steel Inc can generate a mill test report as evidence of steel origins across all processing steps. This mill report (potentially structured as a verifiable credential issued by Steel Inc.’s decentralized identifier keys) can be saved in a secure data store (stored in a discoverable way, one or more locations). Steel Inc can specify that only keys held by their customer Advanced Automotive, as well as audit and regulatory bodies in the US Government (like USTR), will be accepted to decrypt and access the plain text mill certificate data (sharing with explicit consent). Other parties will be required to request access, which Steel Inc may choose to provide depending on the audience. Some attributes included in the mill certificate may be more publicly available, such as the type of steel product, specification/grade, and location where it was melted and poured. These data fields are still securely stored with the rest of the mill certificate, but they do not require authentication to view.
Super Fresh Market had several customers get sick after consuming a cabbage and romaine lettuce mix. The company is worried about another E. coli scare and wants to quickly identify where the product came and what other stores it may be in. Their staff member Carlos scans a QR code on the salad mix and authenticates as an employee. After logging in he can see origin and transport details not available to the general public, such as the fact that the lettuce came from Sunbright Farms in California, and that it was transported in refrigerated vehicles the prior week by two different third party logistics providers. Carlos can reach out to request that these logistics companies provide more detailed information about their cold storage temperature readings and transit routes. Given the potential recall situation each logistics company chooses to grant Carlos access to view this information. Carlos is able to identify a spike in produce temperature during the final leg of distribution, meaning that product in three of their stores may be impacted. The salad mix is immediately removed from the three stores as the investigation continues. In this scenario Carlos, acting as a Super Fresh Market employee, is already able to authenticate and view some information in a secure data store. He can also quickly be granted access to more information based on the urgency of the recall scenario. However on a daily basis he does not need to see trade route information for logistics companies, which is typically considered a trade secret.
by Adrian Gropper, edited by Juan Caballero
Note: This is extracted from a longer, more detailed use case including helpful flow diagrams and an enrollment section which is stored here
Alice’s health report is a short narrative impression of as little as one session of care signed by Dr. Bob to be presented to her employer and/or filed in her electronic health record for access by other care providers. Alice's records are held in trust by a "personal record service" recommended to her by her local public library. This service consists of an SDS and an authorization server she uses to access records in that SDS, as well as to grant read and/or write access to others. This server does not need to use SSI, but it MUST be an interoperable one she can change for another at any time. (This swappability of SSI for non-SSI authorization servers may well be a crucial investment in SSI migration for health records). For the sake of familiarity, we could use UMA as the server and OAuth as the interoperability standard for this server, but other forms of prior art might be more relevant in some contexts. [1]
This use case's main virtues for the purposes of the SDS specification is that it has complex authorization requirements and data privacy requirements, some of which are best met by having a separation of concerns between authorization and storage which would ensure maximum prevention of vendor lock-in by ensuring authorization and storage are separately interchangeable/competitive.
Note: These steps are clarified greatly with an adoption-oriented flowchart included in the longer version of this use case [3].
Library: The Library installs Issuer/Verifier software and displays a list of compatible mobile wallets and secure data stores. Secure data stores that enable **independent choice** of control agent and vice versa are called out as preferable by the library. For more detail and a flow diagram, see the more detailed use-case linked from the summary section above.
Dr. Bob installs a mobile identity wallet and is thus able to sign documents, post timestamps to a public blockchain or other timestamping service, and satisfy legal retention of documents and signature proofs by sending them to his secure email address. The Library’s issuer software enables Dr. Bob to self-assert his medical license and use a public notary to countersign the credential. The notary’s record can be verified online. Dr. Bob adds his notarized credential to his mobile wallet.
Alice: Alice uses her mobile phone number to sign up for a secure data store chosen from the Library’s list. If she doesn’t have one already, Alice receives, via SMS, a link to her authorization agent as suggested by the secure data store and picks a password.
Alice contacts Dr. Bob (out of band), gets various tests and diagnoses (out of band) and asks Dr. Bob to issue her a health report via the Library Issuer service for self-sovereign storage and presentation. Dr. Bob accesses an Issuer service using his credentials, dictates the health report, adds Alice’s SMS number and signs the report, leaving a timestamp and a proof. Document and proof are sent to Dr. Bob’s secure email for legal retention. Alice gets an SMS or secure email from the Library with a link to the report and proof. Alice clicks the link which opens as a form on the Library site. Alice enters her authorization server endpoint onto the form. The Library server contacts Alice’s authorization server. Alice may have to sign-in. The health report and proof are sent to Alice’s secure data store. The Library deletes the documents that Dr. Bob had stored temporarily for Alice. Alice receives an SMS confirmation with a QR code that links to the document in the secure data store.
Alice decides to keep her new data store but change the authorization server. Alice opens a new authorization server account. Alice goes to the new data store via its current authorization server and specifies her new authorization server. The employer wants to check Alice’s health report again. The old QR code points to the old authorization server. Alice has to create a new QR code that points to the new authorization server. This may need to be a manual operation, but either way is beyond the scope of this specification.
A year later, Dr. Bob wants to see the old report before dictating a new report. Dr. Bob enters Alice’s SMS into a form at the Library. The Library sends Alice a message asking for her current authorization server and a request for the old report. Alice agrees and replies with links to her current authorization server and to the specific document in question. Dr. Bob, using the Library as a verifier, presents his credentials to Alice’s authorization server and retrieves the document.
Based on the use cases, we consider the following deployment topologies:
Multi-/Cross-cloud: Some use cases (IoT / machine to machine / Skynet / guardianship ) require a non-human or non-functioning actor to delegate KMS/key control to a cloud vault for oversight or human intervention. In the case of some Password manager use case architectures or biometrically accessed/deployed key material storage, as well as some multi-cloud/hybrid-cloud architectures, key material will need to be retrieved from at least one other vault before accessing the vault being specified here.
Keys in control of such an entity might still need to securely store signed credentials or data in a separate vault. Additional diagramming or specifications will be needed to show how this 2-vault solution could be constrained to be secure and feasible, even if non-normative.
The following sections elaborate on the requirements that have been gathered from the core use cases.
One of the main goals of this system is ensuring the privacy of an entity's data so that it cannot be accessed by unauthorized parties, including the storage provider.
To accomplish this, the data must be encrypted both while it is in transit (being sent over a network) and while it is at rest (on a storage system).
Since data could be shared with more than one entity, it is also necessary for the encryption mechanism to support encrypting data to multiple parties.
As much metadata as possible should also be protected, including filenames, file sizes and directory topology.
It is necessary to have a mechanism that enables authorized sharing of encrypted information among one or more entities.
The system is expected to specify one mandatory authorization scheme, but also allow other alternate authorization schemes. Examples of authorization schemes include OAuth2, Web Access Control, and ZCAPs (Authorization Capabilities).
The system should be identifier agnostic. In general, identifiers that are a form of URN or URL are preferred. While it is presumed that [DID-CORE] (Decentralized Identifiers, DIDs) will be used by the system in a few important ways, hard-coding the implementations to DIDs would be an anti-pattern.
It is expected that information can be backed up on a continuous basis. For this reason, it is necessary for the system to support at least one mandatory versioning strategy and one mandatory replication strategy, but also allow other alternate versioning and replication strategies.
Large volumes of data are expected to be stored using this system, which then need to be efficiently and selectively retrieved. To that end, an encrypted search mechanism is a necessary feature of the system.
It is important for clients to be able to associate metadata with the data such that it can be searched. At the same time, since privacy of both data and metadata is a key requirement, the metadata must be stored in an encrypted state, and service providers must be able to perform those searches in an opaque and privacy-preserving way, without being able to see the metadata.
Since this system can reside in a variety of operating environments, it is important that at least one protocol is mandatory, but that other protocols are also allowed by the design. Examples of protocols include HTTP, gRPC, Bluetooth, and various binary on-the-wire protocols. At least an HTTPS API must be defined by the specification.
At least one mandatory format for encrypted data, and then some optional ones?
This section elaborates upon a number of guiding principles and design goals that shape Encrypted Data Vaults.
A layered architectural approach is used to ensure that the foundation for the system is easy to implement while allowing more complex functionality to be layered on top of the lower foundations.
For example, Layer 1 might contain the mandatory features for the most basic system, Layer 2 might contain useful features for most deployments, Layer 3 might contain advanced features needed by a small subset of the ecosystem, and Layer 4 might contain extremely complex features that are needed by a very small subset of the ecosystem.
This system is intended to protect an entity's privacy. When exploring new features, always ask "How would this impact privacy?". New features that negatively impact privacy are expected to undergo extreme scrutiny to determine if the trade-offs are worth the new functionality.
Servers in this system are expected to provide functionality strongly focused on the storage and retrieval of encrypted data. The more a server knows, the greater the risk to the privacy of the entity storing the data, and the more liability the service provider might have for hosting data. In addition, pushing complexity to the client enables service providers to provide stable server-side implementations while innovation can be carried out by clients.
Peergos has many of the same requirements as SDS (actually stronger privacy requirements, especially against a quantum computer) and is built on top of ipfs. In summary, its properties include: