News Release

Apr. 16, 2020 TSE TSE to launch Limited Public Distribution Proof of Concept Testing for TSE Timely Disclosure Corpus

 

Tokyo Stock Exchange, Inc. (TSE) is launching Proof of Concept (PoC) testing for limited public distribution of corpus data created from timely disclosure documents and others. Corpus is an accumulation of digitized natural language sentences utilized for research on natural language processing, etc. and in recent years, employed especially for machine translation. The PoC testing will provide a sample of both parallel corpus and monolingual corpus created from timely disclosure documents and others, with the objective of using feedback from the participants of the PoC testing to verify the possible usability and applications of the data. TSE will also consider developing a service to distribute the data, based on the results of the PoC.

Name Data Outline
Timely Disclosure Document
Monolingual Corpus
(Japanese or English)
Each Japanese and English corpus constructed by mechanically extracting text from disclosure documents and others (PDF format) during a specific period
Timely Disclosure Document
Parallel Corpus
(Japanese and English)
Japanese and English parallel corpus constructed based on the above monolingual corpus from timely disclosure document
(note)
  • ・The data provided is from 2019 disclosure documents.

For those interested in applying to participate in the PoC

The PoC testing participants must be trading participants of TSE or Osaka Exchange, Inc., clearing participants of Japan Securities Clearing Corporation, or any other corporation deemed appropriate by TSE.
Prospective participants are required to apply for both the PoC Program for Utilizing Securities Data and this PoC testing program separately.
For information on how to apply, please contact the following.

Contact

Tokyo Stock Exchange, Inc.,  Information Services Department, Service Development Group
E-mail: inf_dev@jpx.co.jp