This report is part of the outputs of the ODI-MS PLN program. It is written by Open Data China team led by Dr Feng Gao on behalf of Fintech Innovation Data Collaboration. Any question regarding this report should be sent to [email protected]
The ODI has partnered with Microsoft to launch its Open Data Campaign which aims to ‘help address the looming ‘data divide’ and help organisations of all sizes to realise the benefits of data and the new technologies it powers. More details are available at the ODI website.
The Fintech Innovation Data Collaboration is a conceptual project explored by a group of finance institutions coordinated by Open Data China with the aim to unlock and maximize the data values by allowing SMEs to access and benefit from those data.
Open Data China, founded in 2014, is a social enterprise aims to create an open digital future. Started as Open Knowledge Local Group, Open Data China has expanded its focus from Open Data to Open Governance, Digital rights and Open Economy.
The Fintech Innovation Data Collaboration is a conceptual data collaboration currently under design and is part of the ODI-MS Data Collaboration Peer Learning Work.
This report provides readers with an introduction to the collaboration: why it is set up and how currently it is designed. The report also summarizes key challenges and lessons learnt from the journey of designing and building up this data collaboration.
This report also offers a reflection on the participation in the ODI-MS PLN program with a focus on what impact PLN has on the data collaboration and what lessons are learnt from peers and ODI-MS teams during our time within the network.
Finally the report advises readers on how to best work on data collaboration:
Be sure to have inside Champions to win trust of key stakeholders
Combine technologies with right governance strategy to win trust
Act early and build upon failure
Find peers who can support your work
At Open Data China, we had long-term experience in building up different type of data access initiatives. One of our successful work was called SODA, which is a challenge-based data collaboration involving government agencies in offering selected groups of SMES archived urban data (e.g. taxi gps) to solve urban issues. Based upon the success of SODA, we also were involved in experimenting a telecommunication data collaboration which tried to build up a company to represent collective interest and convert data contribution to company shares. That collaboration failed because of a lack of clarity in purpose and got lost in searching possible usage scenarios.
In late 2020, Open Data China was engaged by AI Space, an AI Incubator to work with a group of finance institutes to explore different options to unlock data values. AI Space was tasked by the government on operating an AI challenge program called AIWIN which partnered with the group of finance institutions and started the conversation on jointly exploring data values.
One of the shared interests among finance institutions is how to build better machine learning models by putting data together. This is because finance institutions such as banks may only hold one piece or facet of data about a client (individual or organization) and by sharing data and putting data together it is possible to better assess credit of client or improve other services.
In addition to this internal objective, finance institutions also have interest in supporting SMEs who may already work in the finance sector or beyond. In this case, finance institutions are interested in how to make profits by enabling SME innovations.
The other possible objective as discussed by finance institutions is to create a shared data pool for education as many universities lack such a channel to access real finance data for skill training.
The framework of the data collaboration can be divided into three parts:
The green part is the group of data providers. In this Fintech innovation data collaboration, state-owned institutions are mainly involved though there are still two private banks. Data providers plan to sign an exclusive contract with the data collaboration in order to allow the collaboration to further operate data assets. There are two types of data are mainly covered by this data collaboration: the first is data held by finance institutions about client (individual or company) to assess credit and the second is data about product purchase history between clients and finance institutions.
The orange part is the infrastructure of this collaboration. We focus on three layers: legal layer where we are discussing how to create a license or agreement framework to define the relationship between data collaboration and data users; the tech layer where we are exploring different options of privacy-preserving techniques; gov-support layer where we are building up strong relationship with government which means a lot to state-owned institutions.
Finally it is the blue part, which proposes a possible operation model where the AI Space as an incubator will be responsible to interact with SMEs to understand their needs and negotiate with them on business terms. To provide seed-funding for daily operation, we also plan to engage an industry park who may have interest to fund this collaboration in turn to benefit SMEs residing in its park.
The most important question asked by data providers is how this data collaboration can bring back benefits and how such benefits can pay back any investments such as operation cost and data cost.
To answer this question, our original approach is to explore a hierarchy charging model:
Aggregate data that could be free of charge
Data that is anonymized and provided in the form of API can be charged based upon how much is consumed
Data that is highly sensitive and can only be accessed and used within a sandbox or safe environment can be charged based upon a 1:1 agreement terms. Possible options include considering data as investment into SME to exchange for an equity share, or exchange data for services that finance institutions can benefit from.
However, the ODI sustainable data access workbook offers us a new perspective. The tool asks us to reflect upon the strength, weakness, threats and opportunities of the data collaboration and build possible revenue streams upon reflection.
In our case, we realized that our early exploration of data collaboration can be considered as a strength that means we may provide our experience and lessons learnt as consulting services to other organizations. We also realize that we may hold a rich log data about who accesses our data and what usage it is for, so we can dive into log data to generate new insights about the data application&product market and report it back to the government as regulator or other marketing consulting firm who produces market reports. The other possible revenue stream as we think about is that the data collaboration can certify potential data users so we can prove they offer trustworthy data products and we can charge data users for the certification.
Those are possible new revenue streams we identified in a limited time workshop session and of course we can use this tool combined with data ecosystem mapping tool to identify more new stakeholders and associated new revenue streams.
The second key challenge we dealt with is how to create a license or agreement framework that can clearly define what data users can do with our data including how they may commercialize their models or products.
We studied a set of new data licenses made available by linux foundation, Microsoft and Element AI. One of the key best practices we learnt is to define so-called ‘computational use’ in the license, which means “activities necessary to enable the use of Data (alone or along with other material) for analysis by a computer”. and try to break it down into detailed specific sub-actions such as training model, creating representations, and outputting predictions upon models to make it clear about what data users can do and can commercialize.
The last but the most complicated challenge we are still trying to address is how to design a governance structure that is trustworthy to individuals.
In our early design of the data collaboration, we do not include individuals directly in our data ecosystem though individual data are covered by our collaboration. The reason is that we believe finance institutions have their own ways in handling individual rights and as a data collaboration we do not necessarily to interfere.
The new draft release of China privacy protection law in May 2021 changes the situation. As the new law requires large data processing platforms to set up an independent body in representing individual rights and deal with privacy problems. At the current stage, we are exploring different options that individuals might be involved in governing our data collaboration, and we appreciated ODI for sharing internal research with us on guiding us develop the strategy.
Followings are key questions we are still working on:
Whether the data collaboration shall set up such an independent body ? Or should It interact with independent bodies set up by finance institutions?
What is the purpose of the independent body? Engage individuals in auditing and holding the collaboration accountable or Engage individuals in making decisions of data access along with the collaboration?
How could we better use technology to facilitate the interaction between collaboration and independent bodies?
We feel grateful to be part of this PLN program. Especially we got the chance to be exposed to many interesting and useful tools created by ODI and MS. For instance, the data ecosystem mapping tool helps us to further explore stakeholder groups we never thought about before; the sustainable data access workbook drives us to think about revenue streams beyond simply charging for data. Those tools are valuable as they not only ask key questions that enlighten you but also offer scaffolders and frameworks to assist exploring the answers.
During our journey of exploring data collaboration, we do feel quite lonely as there are no other similar initiatives as we know in China. This PLN puts us in touch with other data collaborations, among which we do find similarities. The regular workshop sessions and open forums create a space where we can feel being belonged to and seek not only knowledge exchange but also emotional support from peers .
By joining PLN and sharing learnings publicly through blog and public talk, we are also approached by other organizations who are considering building up data collaboration or who currently operate a data access initiative. For instance, one national data initiative offering application-based research-only data access approached us on the data ecosystem mapping tool to understand more how to use this tool to expand its targeted user group and cover other possible use cases beyond research.
Throughout the workshops conducted during PLN, truth is the core theme. As participants of PLN, we now have an increased understanding of why trust is important in building up and operating a data collaboration. Through learning sharing and tool experiment, we and our finance institution partners all agree that we should put trust as the core of our data collaboration. That means not only we should define a clear strategy in building trust with all kinds of stakeholders but also how our business model can build upon such trustworthy relationships.
The Data Ecosystem Mapping tool created by ODI is the most interesting tool we use during our journey of building up the collaboration. The tool itself offers a simple set of instruments to ask you to think about any kinds of stakeholders and possible data and value exchange between them. In our practice, we recommend that you revisit your old ecosystem map often to reflect upon whether any stakeholders or links between them are missing and invite new groups of partners or stakeholders to look at your map and give you new advice.
A great lesson we learnt from the sustainable data collaboration workshop is somehow as early pioneers on exploring data collaboration, our experience and lessons learnt during our journey of building data collaboration could be valuable and may become commercial services that other organizations will be willing to pay for.
In the last section, we would love to offer our key recommendations to readers who may have interest in building their own data collaboration:
If you ever consider building up a data collaboration, please ensure you have inside champions within those key stakeholders especially those data owner organizations. Such inside champions should have clear understandings of why a data collaboration is needed and can mobilize resources to experiment before C-level leaders to give a green light on formally building the data collaboration.
Do not overestimate the importance of technologies in building up a data collaboration. A common misunderstanding is that by using so-called privacy-preserving technologies such as a safe computing environment or federal computing a data collaboration can safely handle all privacy issues.It is, however, not true and many technologies are still at their early stage with few successful use cases. Right governance strategies around the core theme of trust and trustworthy is the most important work of setting up a data collaboration. We recommend that you can refer to ODI Trustworthy Data Stewardship Guidebook for more tools and strategies.
In our case, we are dealing with several state-owned institutions and that means any decision making around the data collaboration no matter whether it’s about the design or real operation will take a very long and complicated process. We already realize the downside of this situation and start acting on some small experiment before formally asking for approval from the high leadership. It is suggested that other collaborations shall think about how to take a lean startup approach or philosophy to act early with small experiments and build upon failures.
Finally, whenever it’s possible you should find peers who are working on data collaborations to support your journey. It is important to have peers to exchange experience but also get emotional support as well. ODI-MS maybe will launch a second call for the PLN in the future which we strongly encourage any data collaboration initiative shall apply.