Content of the Problem
Napster was created by Shawn Fanning and Sean Parker and launched in 1999. Within a year, it had over 100,000 users, many of whom were sharing copyrighted music for free. However, Napster was involved in a lengthy legal battle with the Recording Industry Association of America (RIAA) and other parties and was eventually forced to shut down in 2001. The courts ruled against Napster, ordering it to stop allowing servers to upload and share copyrighted files. This decision had a significant impact on the file-sharing community and led to a broader discussion about intellectual property rights in the digital age.
Unfortunately, the centralization of the Napster index causes problems as well. Because the algorithm requires this index server for clients to initiate connections, the Napster service is easy to shut down. The central index server becomes a point of failure, and its removal causes all of the Napster clients to cease communication. The ease of shutdown is not really a problem with the protocol, but more a problem of avoiding third party intervention because of legality or other issues
Napster was an online platform used to download and access compressed digital music files in MP3 formats, making it one of the most popular peer-to-peer services in 1999. It was created by Shawn Fanning, an 18-year-old freshman in computer science at Northwestern University. Napster consisted of a central service that indexed its users and the files available on their machines, thus creating a list of music files that are available on Napster’s network. Napster gained immense popularity due to the fact that it was one of the most accessible services to use. In 2001, several record companies such as Sony Music Entertainment, Atlantic Records, MCA Records, Island Records, Motown Records, Capitol Records, and BMG Music collectively filed a lawsuit against Napster. The lawsuit, A&M Records, Inc. v. Napster Inc., was an Intellectual Property case that was adjudicated by the United States Court of Appeals for the Ninth Circuit (upon affirming the ruling of the United States District Court for the Northern District of California). The Court held that the defendant (Napster) was held liable for contributory infringement and vicarious infringement of the plaintiff’s copyrights. The Napster case gained immense popularity since it explored the intersection of copyright laws and peer-to-peer file sharing systems.
Literature Review
Napster was a peer-to-peer (P2P) file sharing service that used a centralized index server to connect users to digital audio files:
- Index server
Users could search the index by song title or artist name. If the index found the song on another connected computer, the user could download a copy and also share their own files in response to other users’ searches.
- P2P model
The files themselves were not stored on a central server, but instead indexed links to files on users’ computers. This is what gives the technology its name, “peer to peer”.
- Peers
Peers acted as both clients and servers. When a peer needed a file, it would look up the file in the index server, which would provide a list of peers that had the file. The peer could then connect to that peer to download the file, and also wait for requests from other peers to send the requested file.
Napster’s file sharing protocol is based around a central index server that contains the files being shared by Napster clients. In order to find a file, a client accesses this central index, finds the file they desire, and then locates the machine on which this file resides. Once the machine that has the file is found, the client initiates a connection with the machine and downloads the file directly from there. The central index server provides a number of useful features in the Napster system. First, it allows easy and quick searches of data that resides on the network. Second, it reduces the amount of network bandwidth taken up by the algorithm because search requests do not need to be forwarded amongst clients. Finally, the central server makes it easy to connect to the Napster network. Because there is a single known address for the central server, this connection can be done automatically by installation software.
- A central indexing server. This server indexes the contents of all of the peers that register with it. It also provides search facility to peers. In our simple version, you don’t need to implement sophisticated searching algorithms; an exact match will be fine. Minimally, the server should provide the following interface to the peer clients:
- register (peer id, file name, …) — invoked by a peer to register all its files with the indexing server. The server then builds the index for the peer. Other sophisticated algorithms such as automatic indexing are not required, but feel free to do whatever is reasonable. You may provide optional information to the server to make it more ‘real’, such as the clients bandwidth, etc.
- lookup (file name) — this procedure should search the index and return all the matching peers to the requestor.
- A peer. A peer is both a client and a server.
As a client, the user specifies a file name with the indexing server using “lookup”. The indexing server returns a list of all other peers that hold the file. The user can pick one such peer and the client then connects to this peer and downloads the file.
As a server, the peer waits for requests from other peers and sends the requested file when receiving a request.
Minimally, the peer server should provide the following interface to the peer client:- download (file name) — invoked by a peer to download a file from another peer.
The Napster Crawler Because they did not have direct access to indexes maintained by the central Napster servers, the only way we could discover the set of peers participating in the system at any given time was by issuing queries for files, and keeping a list of peers referenced in the queries’ responses. To discover the largest possible set of peers, we issued queries with the names of popular song artists drawn from a long list downloaded from the web. Based on our experience and observations, the Napster server cluster consists of approximately 160 servers; each peer establishes a connection with only one server. When a peer issues a query, the server the peer is connected to first reports files shared by “local users” on the same server, and later reports matching files shared by “remote users” on other servers in the cluster. For each crawl, we established a large number of connections to a single server, and issued many queries in parallel; this reduced the amount of time taken to gather data to 3-4 minutes per crawl, giving us a nearly instantaneous snapshot of peers connected to Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts 5 that server. For each peer that we discovered during the crawl, we then queried the Napster server to gather the following metadata: (1) the bandwidth of the peer’s connection as reported by the peer herself, (2) the number of files currently being shared by the peer, (3) the current number of uploads and the number of downloads in progress by the peer, (4) the names and sizes of all the files being shared by the peer, and (5) the IP address of the peer. The Napster protocol indicates which peers connect to the same server as our crawler (local peers) and which peers connect to other Napster servers (remote peers). To get an estimate of the fraction of the total user population we captured, we separated the local and remote peers returned in our queries’ responses, and compared them to statistics periodically broadcast by the particular Napster server that we queried. From these statistics, we verified that each crawl typically captured between 40% and 60% of the local peers on the crawled server. Furthermore, this 40-60% of the peers that we captured contributed between 80-95% of the total (local) files reported to the server