Random generation

In certain use cases it is useful to generate random graphs, for example, for testing or benchmarking purposes. For that reason the Neo4j Graph Algorithm library comes with a set of built-in graph generators. The generator stores the resulting graph in the graph catalog. That graph can be used as input for any algorithm in the library.

This feature is in the beta tier. For more information on feature tiers, see API Tiers.

It is currently not possible to persist these graphs in Neo4j. Running an algorithm in write mode on a generated graph will lead to unexpected results.

The graph generation is parameterized by three dimensions:

node count - the number of nodes in the generated graph
average degree - describes the average out-degree of the generated nodes
relationship distribution function - the probability distribution method used to connect generated nodes

Syntax

The following describes the API for running the graph generation procedure

CALL gds.graph.generate(
    graphName: String,
    nodeCount: Integer,
    averageDegree: Integer,
    configuration: Map
})
YIELD name, nodes, relationships, generateMillis, relationshipSeed, averageDegree, relationshipDistribution, relationshipProperty

Table 1. Parameters
Name	Type	Default	Optional	Description
`graphName`	String	`null`	no	The name under which the generated graph is stored.
`nodeCount`	Integer	`null`	no	The number of generated nodes.
`averageDegree`	Integer	`null`	no	The average out-degree of generated nodes.
`configuration`	Map	`{}`	yes	Additional configuration, see below.

Table 2. Configuration
Name	Type	Default	Optional	Description
`relationshipDistribution`	String	`UNIFORM`	yes	The probability distribution method used to connect generated nodes. For more information see Relationship Distribution.
`relationshipSeed`	Integer	`null`	yes	The seed used for generating relationships.
`relationshipProperty`	Map	`{}`	yes	Describes the method used to generate a relationship property. By default no relationship property is generated. For more information see Relationship Property.
`aggregation`	String	`NONE`	yes	The relationship aggregation method cf. Relationship Projection.
`orientation`	String	`NATURAL`	yes	The method of orienting edges. Allowed values are NATURAL, REVERSE and UNDIRECTED.
`allowSelfLoops`	Boolean	`false`	yes	Whether to allow relationships with identical source and target node.

Table 3. Results
Name	Type	Description
`name`	String	The name under which the stored graph was stored.
`nodes`	Integer	The number of nodes in the graph.
`relationships`	Integer	The number of relationships in the graph.
`generateMillis`	Integer	Milliseconds for generating the graph.
`relationshipSeed`	Integer	The seed used for generating relationships.
`averageDegree`	Float	The average out degree of the generated nodes.
`relationshipDistribution`	String	The probability distribution method used to connect generated nodes.
`relationshipProperty`	String	The configuration of the generated relationship property.

Relationship Distribution

The relationshipDistribution parameter controls the statistical method used for the generation of new relationships. Currently there are three supported methods:

UNIFORM - Distributes the outgoing relationships evenly, i.e., every node has exactly the same out degree (equal to the average degree). The target nodes are selected randomly.
RANDOM - Distributes the outgoing relationships using a normal distribution with an average of averageDegree and a standard deviation of 2 * averageDegree. The target nodes are selected randomly.
POWER_LAW - Distributes the incoming relationships using a power law distribution. The out degree is based on a normal distribution.

Relationship Property

The graph generator is capable of generating a relationship property. This can be controlled using the relationshipProperty parameter which accepts the following parameters:

Table 4. Configuration
Name	Type	Default	Optional	Description
`name`	String	null	no	The name under which the property values are stored.
`type`	String	null	no	The method used to generate property values.
`min`	Float	0.0	yes	Minimal value of the generated property (only supported by `RANDOM`).
`max`	Float	1.0	yes	Maximum value of the generated property (only supported by `RANDOM`).
`value`	Float	null	yes	Fixed value assigned to every relationship (only supported by `FIXED`).

Currently, there are two supported methods to generate relationship properties:

FIXED - Assigns a fixed value to every relationship. The value parameter must be set.
RANDOM - Assigns a random value between the lower (min) and upper (max) bound.

Relationship Seed

The relationshipSeed parameter allows the user to specify the seed used to generate the random graph manually. When specified, the procedure will produce the same relationships between nodes regardless of whether the generated graph is going to be created as weighted or unweighted. This can be helpful if one wants to examine the behavior or performance of an algorithm under weight conditions.

Examples

All the examples below should be run in an empty database.

In the following we will demonstrate the usage of the random graph generation procedure.

Generating unweighted graphs

The following will produce a graph with unweighted relationships

CALL gds.graph.generate('graph',5,2, {relationshipSeed:19})
YIELD name, nodes, relationships, relationshipDistribution

Table 5. Results
name	nodes	relationships	relationshipDistribution
"graph"	5	10	"UNIFORM"

A new in-memory graph called graph with 5 nodes and 10 relationships has been created and added to the graph catalog. We can examine its topology with the gds.graph.relationships procedure.

The following will show the produced relationships

CALL gds.graph.relationships.stream('graph')
YIELD sourceNodeId,targetNodeId
RETURN  sourceNodeId as source, targetNodeId as target
ORDER BY source ASC,target ASC

Table 6. Results
source	target
0	1
0	2
1	0
1	4
2	1
2	4
3	0
3	1
4	0
4	3

Generating weighted graphs

To generated graphs with weighted relationships we must specify the relationshipProperty parameter as discussed above.

The following will produce a graph with weighted relationships

CALL gds.graph.generate('weightedGraph',5,2, {relationshipSeed:19,
  relationshipProperty: {type: 'RANDOM', min: 5.0, max: 10.0, name: 'score'}})
YIELD name, nodes, relationships, relationshipDistribution

Table 7. Results
name	nodes	relationships	relationshipDistribution
"weightedGraph"	5	10	"UNIFORM"

The produced graph, weightedGraph, has a property named score containing a random value between 5.0 and 10.0 for each relationship. We can use gds.graph.relationshipProperty.stream to stream the relationships of the graph along with their score values.

The following will show the produced relationships

CALL gds.graph.relationshipProperty.stream('weightedGraph','score')
YIELD sourceNodeId, targetNodeId, propertyValue
RETURN  sourceNodeId as source, targetNodeId as target, propertyValue as score
ORDER BY source ASC,target ASC, score

Table 8. Results
source	target	score
0	1	6.791408433596591
0	2	8.662453313014902
1	0	6.258381821615686
1	4	9.711806397654765
2	1	9.469695236791349
2	4	6.519823445755963
3	0	8.747179900968224
3	1	7.752117836610726
4	0	8.614858979680758
4	3	5.060444167785128

Notice that despite as graph and weightedGraph have been created with the same seed their relationship topology is equivalent.