Clique Estimation Software
We make available two standalone Python scripts to demonstrate the estimators described in our paper Estimating Clique Composition and Size Distributions from Sampled Network Data.- EstimateCliques.py: Provides a standalone script that implements estimators for clique size distributions. It implements two types of unbiased estimators, one of which exploits labeling of sampled nodes neighbors and one of which does not require this information. Additionally, it supports the compositions of cliques by node attributes. This version only supports binary node attributes (such as gender). It receives as input a Python pickle that describes the egocentric sample and contains the clique calculations for each egonet separately. It outputs the estimation results.
- SampleEgonets_CalculateCliques.py: Provides a standalone script that demonstrates how to prepare the data for the EstimateCliques.py script. More specifically, it receives as input a known graph, sampling parameters (sampling method, sampling size, replacement type), and clique distribution preferences (labeling, attributes). It then appropriately samples egonets from the given graph and calculates the maximal clique distribution for each sampled egonet. The output of this script is an appropriately formatted Python pickle that can be used as input to EstimateCliques.py
If you use our software, please cite our paper using the below bibtex entry:
@article{gjoka13_CliqueEstimation, title= {{Estimating Clique Composition and Size Distributions from Sampled Network Data}}, author= {Minas Gjoka and Emily Smith and Carter T. Butts}, journal = {arXiv:cs.SI:1308.3297}, year = {2013} }
EstimateCliques
Download here. Instructions of usage follows below.Input
The script EstimateCliques receives two parameters.- The filename of a Python pickle that contains (i) the clique distribution calculation for each egonet in an egocentric network sample (ii) sampling parameters of the egocentric network sample (iii) clique distribution preferences used in the clique distribution calculation. Specifically, the Python pickle is expected to have a Python dictionary with the following keys (and corresponding values)
- 'N' => A number that determines the graph size
- 'n' => A number that determines the egonet sample size
- 'sample_type' => A value that determines the used sampling method. Takes values from 'uis', 'wis', 'rw'
- 'replacement' => A value that determines the used replacement method. Takes values from 'with', 'wout'
- 'nsamples' => A dictionary that determines the number of sample repetitions for each ego. It is relevant when the egocentric network sample was collected with replacement. Here is an example: {'user1':3, 'user2':1, ..}. In this example, 'user1' was sampled three times.
- 'weight' => A dictionary that determines the used sampling weight for each ego. It is relevant when the egocentric network sample was collected with 'wis' or 'rw'.
Here is an example: {'user1':4, 'user2':1, ..}. In this example, 'user1' has sampling weight four. - 'labeling' => A boolean value that determines whether the neighbors of the sampled egos are considered to be labeled. Controls whether the Clique Degree Sum (CDS) estimators (value 0) or the Clique Counting (CC) estimators are used (value 1)
- 'attributes' => A boolean value (0 or 1) that determines whether node attributes are used in the clique composition.
- 'cliques' => A dictionary of lists that contains for each egonet the exact calculation of the clique distribution. In each list, the entry i contains the number of order-i cliques of the corresponding egonet. It is used when 'labeling'==0 and 'attributes'==0. Here is an example:
{'user1':[0,0,1,51,32,14,8],'user2',[0,0,0,89,73,2],..}
In this example, 'user1' has 1 order-2 clique, 51 order-3 cliques, 32 order-4 cliques etc. - 'cliques_label' => A dictionary of lists that contains for each egonet the exact calculation of the clique distribution with labeling. A list consists of tuples with the clique size and a unique identifier of the clique composition for each clique. A simple method we use to get a unique identifier of a clique composition is to hash the string that is produced by concatenating the sorted node ids that make up the clique. It is used when 'labeling'==1 and 'attributes'==0. Here is an example:
{'user1':[(6,'497406'), (5,'155082'),..],'user2':[(3,'193628')],..}
In this example, the egonet of 'user1' has one clique of size 6 with unique identifier '497406', another clique of size 5 with unique identifier '155082' etc. - 'cliques_attributes' => A dictionary of dictionaries of dictionaries that contains for each egonet the exact calculation of the clique distribution with node attributes. It currently supports only binary node attributes i.e. gender. The first level of dictionaries is used for the userid, the second level for the number of nodes in the clique that have as a node attribute the value "0" and the third level for the number of nodes in the clique that have as a node attribute the value "1". The value of the third level dictionary determines the number of cliques. It is used when 'labeling'==0 and 'attributes'==1. Here is an example:
{'user1':{2:{4:8},3:{2:7},..},'user2':{3:{0:1},..}}
In this example, the egonet of 'user1' has 8 cliques that consist of 2 nodes with node attribute "0" and 4 nodes with node attribute "1" (total clique size 6). - 'cliques_attributes_label' => A dictionary of dictionaries of dictionaries of lists that contains for each egonet the exact calculation of the clique distribution with node attributes and labeling. It currently supports only binary node attributes i.e. gender. The first level of dictionaries is used for the userid, the second level for the number of nodes in the clique that have as a node attribute the value "0" and the third level for the number of nodes in the clique that have as a node attribute the value "1". The list, which is the value of the third level dictionary, consists of the unique identifiers of the clique composition (with attributes) for each clique. It is used when 'labeling'==1 and 'attributes'==1. Here is an example:
{'user1':{2:{4:['497406',..]},3:{2:['155082',..]},..},'user2':{3:{0:['193628']}}}
In this example, the egonet of 'user1' has one clique with unique identifier '497406' that consists of 2 nodes with node attribute "0" and 4 nodes with node attribute "1" (total clique size 6). Also, the egonet of 'user2' has one clique with unique identifier '193628' that constists of 3 nodes with node attribute "0" and 0 nodes with node attribute "1" (total clique size 3) etc.
- The filename where the estimation result will be stored in a Python pickle format. The estimation result will be stored as a dictionary.
Output
The script EstimateCliques.py outputs the estimation result at the filename given as the second parameter. Additionally it prints out the estimation result.Example
> ./EstimateCliques.py soc-Slashdot0811.edges.gz_sample_calculation.pickle slashdot_clique_estimation.pickle File 'soc-Slashdot0811.edges.gz_sample_calculation.pickle' successfully loaded Estimation result dumped at file 'slashdot_clique_estimation.pickle' Estimation result printout: {2: 175057.43, 3: 73891.52, 4: 104121.84, ..., 26: 1573.08}
The printout is interpreted as follows. The topology soc-Slashdot is estimated to have 175057.43 order-2 cliques, 73891.52 order-3 cliques , ... , and 1573.08 order-26 cliques
SampleEgonets_CalculateCliques
Download here. Instructions of usage follows below.Input
The script SampleEgonets_CalculateCliques receives six parameters in the following order.- Filename of the graph to be loaded and sampled. The script accepts edgelist (edges.gz) format. The graphs used in the paper and referenced in the examples are conveniently located at http://www.minasgjoka.com/cliques/graphs/
- Egonet sample size 'n'
- Sampling method type. Takes values from 'uis', 'wis'
- Replacement method. Takes values from 'with', 'wout'
- A boolean value that determines whether the neighbors of the sampled egos are considered to be labeled. Controls whether the Clique Degree Sum (CDS) estimators (value 0) or the Clique Counting (CC) estimators are used (value 1)
- Filename of node attributes that will be used in the clique composition. Set 'none' if node attributes are NOT used.
Output
The output is Python pickle file that can be used as an input in the EstimateCliques script. The Python pickle stores a dictionary with the following keys (and corresponding values). See the Input section of EstimateCliques for additional information for each entry below.- 'N'
- 'n'
- 'sample_type'
- 'replacement'
- 'nsamples'
- 'weight'
- 'labeling'
- 'attributes'
- 'cliques'
- 'cliques_label'
- 'cliques_attributes'
- 'cliques_attributes_label'
Example
Here is an example that loads soc-Slashdot graphs, samples 100 egonets uniformly at random with replacement, and calculates for each egonet clique distributions without binary attributes and makes use of labeling. It prepares a Python pickle output that is ready to be used with Clique Counting (CC) estimators.> ./SampleEgonets_CalculateCliques.py soc-Slashdot0811.edges.gz 100 uis with 1 none Loading graph soc-Slashdot0811.edges.gz.. [uis] 100 egonet samples n:100 N:77360 sample_type:uis replacement:with labeling:1 attributes:0 Storing clique calculation for sampled egonets in file 'socSlashdot0811.edges.gz_sample_calculation.pickle'
© 2013 Sept.