August 5 Meeting Minutes
Enclosed is a draft of the August 5 meeting minutes based on my notes. Please add to, suggest revisions or other comments. As an addendum an idea raised by Ms. Patricia Bauman in a subsequent conversation was that any consortium formed to disseminate EDGAR data after Nov. 1995 may want to consider bidding for the dissemination contract. Since our meeting, I have been working with the National Public Telecomputing Network (who coordinate FreeNets) to publicize EDGAR, as well as the Academy of Management, and A-Net for accounting professionals.
Draft Internet Edgar Project Advisory Committee Meeting August 5, 1994
This note provides a summary of the Internet Edgar Project Advisory committee meeting on the afternoon of August 5, 1994 held at NYU's Stern School of Business. The committee met to review progress and provide advice on the NSF sponsored project (NCR-9319331) to investigate ways of posting large government databases on to the Internet.
Members present at the meeting were: * Edward Stohr, Chairman Information Systems Department, NYU * Henry Perritt Jr., Professor Villanova Law School * Richard Mackinnon, State Street Bank Fellow * Ajit Kambil/ Carl Malamud - The Project Team * Gary Bass, OMB Watch * George Sadowsky, Director of Academic Computing, NYU * Scott Armstrong, Information Trust * Nancy Kranich , American Libraries Association
Observers included: * Dan van Bellegham, National Science Foundation * Mark Ginsburg, Ph.D candidate, New York University * N. Ranganathan, Ph.D candidate, New York University * D. Bodoff, Ph.D candidate, New York University
Members Absent: * Ronald Plesser, Information Industries Association * Mike Roberts, Vice President EDUCOM * Patricia Bauman, The Bauman Foundation * Gerald Yung, Mead Data Central
The meeting began with an introduction of current project status by Ajit Kambil/Carl Malamud, a summary of what was learned in the first nine months and a demonstration of the current system. The discussion of various issues among all members are organized and summarized below.
Project Status: Ajit Kambil began by noting the key accomplishments since the last meeting in February. First all basic modes of Internet access: electronic mail, file transfer protocol, gopher and world wide web access through Mosaic and Lynx have been implemented. In addition a newsgroup alt.edgar is now available. The service of documents to users was substantially improved by Internet Multicasting by implementing WAIS searches on the index files.
Second at NYU a series of test applications were developed to provide users better access to the data. These applications served two purposes: to provide new services to users and second to build skills among students and project staff at NYU. The WWW technology also provides a tremendous educational opportunity to the school at large. EDGAR is now one of the several NYU servers.
The applications include: * custom forms retrieval which provides users with a pick list of forms and options on the date range over which to search for the form. This way the user gets a precise answer to their document and does not have to scan many choices for their desired form. * Mutual funds look-up - as many individuals invest in mutual funds we built an application which looks at where mutual funds take 5% or more stock positions * Mutual funds batch process - identifies all major purchases of stock by mutual funds over a specific period of time. In this application we extract the target company from the files. This can now be used to send a daily news report on major mutual fund investments. * Extraction of Executive Compensation - this test application pulls the executive compensation table and footnotes from a proxy statement while maintaining a pointer to the source document. * Schedule 13d - 5% Acquisition reports
Third substantial time was spent interviewing users of EDGAR data which while useful often only identified narrow applications. However, the NYU team has now built a working relationship with R.R Donnelley a financial printer and filer of EDGAR documents. This is crucial as the R.R Donnelley staff have expert domain knowledge about the structure of documents to help us identify and extract critical information for higher level applications.
Fourth an electronic user survey was designed and implemented over the different access mechanisms. The team was still receiving and processing results of the survey. The survey highlights key uses of the data, and identifies the demographics of users. Statistical analysis on the data will be deferred till September. Ideas for applications will be incorporated in our work immediately.
Fifth - tracking usage of the system we estimate 4000 files are transferred daily. Most users use ftp and electronic mail which were identified in the press articles. We are trying to encourage them to use gopher, lynx and Mosaic which are much more efficient from a processing perspective. We had 1500 requests daily to the NYU server in May and June, but this dropped off in July to 600. We suspect it is due to people going on vacation and expect growth in usage beginning again in September.
Sixth many sites are beginning to point to our WWW servers which in itself is a very useful and will generate new users. These sites include EINet, FinWeb, MIT, Carnegie Mellon etc. Others are beginning to point to our gopher server. These include Babson, Delphi Internet Services etc.
Related research work underway at NYU is focused on different tagging and retrieval mechanisms on large text databases, assessing and identifying different types of data quality problems, and identifying criteria for evaluating network dissemination projets.
Next Development Steps: In the first part of the project different document types were served to users. In the next phase the project will design and provide higher level search capabilities.
As was identified in the first phase of the project, a small subset of the information is most frequently accessed by users. Based on this the research team is in the process of creating a management brief or a corporate profile that incorporates this information from multiple documents in a semi-automatic and timely way. This information will be indexed or tagged to facilitate higher level retrieval capabilities.
In addition we are working on indexing by industry and other categories to support search and retrieval of information. To support adhoc queries certain documents will be WAIS indexed.
Efforts are also underway to use the news groups to send abstracted information from filings.
Demonstration of the Current System
The current systems were demonstrated in an NYU classroom. Carl Malamud demonstrated services at the IMS site, including the WAIS indexing on the patent database and the experimental Harvest system. The latter can be used to link across databases.
Mark Ginsburg illustrated the programs developed at the NYU site. Both Nancy Kranich and Scott Armstrong suggested the need to index data that would be of use for cross company or industry analysis.
Other Issues Raised by the Project Team
The project is working with rather lean resources. While the team has been as responsive as possible to users to provide help, it has cut into development time. More of the help will have to be outsourced through pulling together an users group or some other mechansim that will use the news group to advise new users.
* Continuation: Second, while many businesses already use the Internet EDGAR filings many companies and entrepreneurs are concerned about the continuation of the project. They would like to continue seeing the project go forward and be assured of a means of continuation prior to building their own value added applications. Some entrepreneurs are willing to share in the cost of continuation.
* Publicity for the system: Many potential users are not yet aware about the system, and its current functionality. On surveys some users were using ftp when they had gopher and Mosaic available to them. The team needs to better information users about improvements.
The team members noted that there is still a need to improve relations with SEC and implement a working relationship. While specific individuals at the SEC have been helpful on certain general queries, the NYU-IMS team has neither been invited to nor informed about any EDGAR related meetings as was held by the SEC earlier this year.
Questions and Discussion:
The EDGAR team encouraged the use of gopher and world wide web browsers such as Mosaic or Lynx. This was because these services were much more efficient than electronic mail or ftp access from a processing and connect time perspective.
Hank Perritt and Gary Bass sought further clarification on why this was more efficient. Carl Malamud and Mark Ginsburg explained that with an anonymous ftp session a user establishes and holds the connection even while it is not being used. In contrast a world wide web connection is a stateless connection. A connection is established between a client and a server machine in the web only during the transfer of a message or file.
Electronic mail was also expensive to process as it typically took the user multiple messages which had to be processed to retrieve a file. In addition many electronic mail users require files to be broken down to specific sizes suitable for e-mail. This adds to processing. George Sadowsky and others suggested that we still serve this community some of the higher level applications currently being developed.
PROJECT CONTINUATION: Much of the discussion focused on project continuation. Carl Malamud and Ajit Kambil stated they would make available the code developed in the project for someone else to run the project at the end of 1995.
Ajit Kambil proposed that steps be taken to develop a consortium or private companies, universities and other organizations (e.g libraries) to continue funding and keeping this resource in the public domain. Such a consortium would continue to purchase the data feed and would continue providing this data as a public and free resource. Based on the current project, Carl and Ajit will look at estimating the costs required to operate such a service. Many companies and individual have expressed their interest in assuring the continued availability of this data at low cost. Others see opportunities in adding value to this data for their clients.
Various other models were proposed in the course of the discussion.:
One suggestion was for the SEC to provide a basic service to the public. Ajit felt a key concern here is the SEC may be constrained by their primary incentives, funding, and other system development priorities to focus less on this system. Design would also require broad public outreach.
A second suggestion was for the SEC to outsource and fund such a service which would be provided by a third party. As Gerald Yung was not present we could not find out Mead Data Central's position here and the exact date when the Mead contract with the SEC expires.
Both of the above would depend on the cooperation of the SEC. Nancy Kranich was concerned a consortium would not necessarily better serve the public. Consortium partners could unduly influence outcomes and not necessarily be responsive to the broader public. She recommended that this data set be provided by government free and electronically to the public. Gary Bass was concerned that it could be hard to get legislative support for this given opposition from the traditional information vendors.
The group felt they should continue this discussion electronically.
DISSEMINATION OF INFORMATION ABOUT THE PROJECT:
While the press articles have been the primary way in which the project information has been disseminated, members of the committee suggested greater outreach to groups such as librarians, the Freenet organizations etc to make users aware of the data.
Ajit Kambil August 19, 1994