TCP Socket Text Streaming in Spark

Spark’s streaming context framework has built in functionality for reading TCP port streams in a distributed processing manner.

It’s all about the socketTextStream() function in the Spark Streaming Context.  It returns a ReceiverInputDStream who’s base class is a DStream. The DStream’s are essentially the lynchpin of the whole Spark Streaming system, as it encapsulates several RDD’s which we use to extract and operate on data in our code.

The Spark distro comes with a several good examples to demo the use of TCP streaming using socketTextStream()….
NetworkWordCount
StatefulNetworkWordCount
– RecoverableNetworkWordCount
– PageViewStream and PageViewGenerator

We can use netcat to test our streaming apps…
How To Use Netcat to Establish and Test TCP and UDP Connections on a VPS
Netcat – Linux man page

The Java networking api is also important…
ServerSocket class (Java API docs)
Lesson: All About Sockets – writing network client/server (Java API docs tutorial)
Java – Networking (Tutorials Point)
Sockets programming in Java: A tutorial (Java World)