Minas Gjoka

Clique Estimation Software

We make available two standalone Python scripts to demonstrate the estimators described in our paper Estimating Clique Composition and Size Distributions from Sampled Network Data.

If you use our software, please cite our paper using the below bibtex entry:

 
@article{gjoka13_CliqueEstimation,
  title= {{Estimating Clique Composition and Size Distributions from Sampled Network Data}},
  author= {Minas Gjoka and Emily Smith and Carter T. Butts},
  journal = {arXiv:cs.SI:1308.3297},
  year = {2013}
}

EstimateCliques

Download here. Instructions of usage follows below.

Input

The script EstimateCliques receives two parameters. The script SampleEgonets_CalculateCliques (see below) demonstrates how to prepare a Python pickle that can be used as an input in the EstimateCliques script.

Output

The script EstimateCliques.py outputs the estimation result at the filename given as the second parameter. Additionally it prints out the estimation result.

Example

> ./EstimateCliques.py soc-Slashdot0811.edges.gz_sample_calculation.pickle slashdot_clique_estimation.pickle

File 'soc-Slashdot0811.edges.gz_sample_calculation.pickle' successfully loaded

Estimation result dumped at file 'slashdot_clique_estimation.pickle'

Estimation result printout:
{2: 175057.43,  3: 73891.52, 4: 104121.84, ..., 26: 1573.08}

The printout is interpreted as follows. The topology soc-Slashdot is estimated to have 175057.43 order-2 cliques, 73891.52 order-3 cliques , ... , and 1573.08 order-26 cliques

SampleEgonets_CalculateCliques

Download here. Instructions of usage follows below.

Input

The script SampleEgonets_CalculateCliques receives six parameters in the following order.
  1. Filename of the graph to be loaded and sampled. The script accepts edgelist (edges.gz) format. The graphs used in the paper and referenced in the examples are conveniently located at http://www.minasgjoka.com/cliques/graphs/
  2. Egonet sample size 'n'
  3. Sampling method type. Takes values from 'uis', 'wis'
  4. Replacement method. Takes values from 'with', 'wout'
  5. A boolean value that determines whether the neighbors of the sampled egos are considered to be labeled. Controls whether the Clique Degree Sum (CDS) estimators (value 0) or the Clique Counting (CC) estimators are used (value 1)
  6. Filename of node attributes that will be used in the clique composition. Set 'none' if node attributes are NOT used.

Output

The output is Python pickle file that can be used as an input in the EstimateCliques script. The Python pickle stores a dictionary with the following keys (and corresponding values). See the Input section of EstimateCliques for additional information for each entry below.
  1. 'N'
  2. 'n'
  3. 'sample_type'
  4. 'replacement'
  5. 'nsamples'
  6. 'weight'
  7. 'labeling'
  8. 'attributes'
  9. 'cliques'
  10. 'cliques_label'
  11. 'cliques_attributes'
  12. 'cliques_attributes_label'

Example

Here is an example that loads soc-Slashdot graphs, samples 100 egonets uniformly at random with replacement, and calculates for each egonet clique distributions without binary attributes and makes use of labeling. It prepares a Python pickle output that is ready to be used with Clique Counting (CC) estimators.
> ./SampleEgonets_CalculateCliques.py soc-Slashdot0811.edges.gz 100 uis with 1 none
Loading graph soc-Slashdot0811.edges.gz.. 

[uis] 100 egonet samples 
n:100 N:77360 sample_type:uis replacement:with
labeling:1 attributes:0

Storing clique calculation for sampled egonets in file 'socSlashdot0811.edges.gz_sample_calculation.pickle'

© 2013 Sept.