Commit 0900675d authored by anil's avatar anil

Final version of HERD code for APT cluster

parents
Pipeline #324 skipped
Clone the below repository and follow the steps mentioned :-
https://gitlab.flux.utah.edu/anilmr/cs693-final-project/tree/master
Steps :-
========
1. Please clone this repository into your local machine.
2. CD to the above cloned directory.
3. Modify the servers file in the directory with the IP address of the first node ( i.e. node 0 ) which would be always in the form 10.10.1.x in the machine in experiment that you setup.
4. Modify prepare.sh file. Replace nodes array with all the nodes that you could find in your experiment cloudlab webpage. ( This step is manual, i thought of automating this, but didn’t found a specific way to extract the first part of hostname like apt162 or something.)
5. Change the run_client.sh file, copy the same nodes array as in prepare.sh file into client_nodes array but remove the corresponding node which is considered as server node ( i.e. node 1 ).
6. Execute prepare.sh.
7. In a separate terminal, SSH to node 1 ( i.e. server node ) and run
$bash sudo run_server.sh.
8. Now on your local machine, execute the bash run_client.sh command.
9. Wait until there is no activity in terminal created in step 7. (i.e. Servers Terminal)
10. Now run $bash generate.sh in your local directory. This will collect all the output from every client and compute the average throughput and print it to the screen.
11. Now repeat the experiemnt by changing PUT_PERCENT in common.h file in HERD repository. I used to change the value of PUT_PERCENT and commit to my repository, then repeat from the step 3. ( This would prevent me from changing the value of PUT_PERCENT in all the nodes )
Steps to Install Mellanox OFED
------------------------------
1. Goto http://www.mellanox.com/page/products_dyn?product_family=26
2. Choose the current stable OFED, agree to the terms and download the tar.gz
file of Mellanox OFED.
3. Untar the file using $# tar xvzf Mellanox-OFED-2.4.1.xyz.tar.gz
4. cd to Mellanox OFED directory after untarring.
5. Run $# sudo ./mlnxofedinstall
6. Once the installation is completed, reboot the machine.
7. Run $# ofed_info | head -3 to verify if Mellanox OFED is installed.
Steps to get RDMA working on APT :-
-----------------------------------
1. After installing Mellanox OFED, load the following modules.
sudo modprobe rdma_cm
sudo modprobe ib_uverbs
sudo modprobe rdma_ucm
sudo modprobe ib_ucm
sudo modprobe ib_umad
sudo modprobe ib_ipoib
sudo modprobe mlx4_ib
sudo modprobe mlx4_en
2. Once modules are loaded successfully, we can see that ib0 device is shown in
ifconfig.
3. By deafult, in APT 1st port is in Infiniband mode and 2nd one in Ethernet
mode.
4. Verify by assigning IP address to IB interface, like below on 2 machines.
$# ifconfig ib0 10.10.1.1/24 up
5. Ping the other machine to verify connectivity.
6. Run " $# rping -sVvd " on one machine.
7. On the other machine, run the following command.
$# rping -c -a <servers IP address> -C10 -Vvd
8. If there are no errors, we are certain that Infiniband is working.
https://www.apt.emulab.net/image_metadata.php?uuid=841c55ca-e9f1-11e4-8b63-2f7555356a5c"
CFLAGS := -O3 -Wall -Werror -Wno-unused-result
LD := gcc
LDFLAGS := ${LDFLAGS} -lrdmacm -libverbs -lrt -lpthread
main: common.o conn.o main.o
${LD} -o $@ $^ ${LDFLAGS}
PHONY: clean
clean:
rm -f *.o main
HERD
====
A Highly Efficient key-value system for RDMA
This version of HERD has been tested for the following configuration:
1. Software
* OS: Ubuntu 12.04 (kernel 3.2.0)
* RDMA drivers: `mlx4` from MLNX OFED 2.2. I suggest using the MLNX OFED version for Ubuntu 12.04.
2. Hardware
* RNICs:
* ConnectX-3 353A (InfiniBand)
* ConnectX-3 313A (RoCE)
* ConnectX-3 354A (InfiniBand)
Initial setup:
-------------
* I assume that the machines are named: `node-i.RDMA.fawn.apt.emulab.net` starting from `i = 1`.
* The experiment requires at least `(1 + (NUM_CLIENTS / num_processes))` machines.
`node-1` is the server machine.
`NUM_CLIENTS` is the total number of client processes, defined in `common.h`.
`num_processes` is the number of client processes per machine, defined in
`run-machine.sh`.
* To modify HERD for your machine names:
* Make appropriate changes in `kill-remote.sh` and `run-servers.sh`.
* Change the server's machine name in the `servers` file. Clients use this file to
connect to server processes.
* Make sure that ports 5500 to 5515 are available on the server machine. Server process `i`
listens for clients on port `5500 + i`.
* Execute the following commands at the server machine:
```bash
cd ~
git clone https://github.com/anujkaliaiitd/HERD.git
export PATH=~/HERD/scripts:$PATH
cd HERD
sudo ./shm-init.sh # Increase shmmax and shmall
sudo hugepages-create.sh 0 4096 # Create hugepages on socket 0. Do for all sockets.
```
* Mount the HERD folder on all client machines via NFS.
Quick start:
-----------
* Run `make` on the server machine to build the executables.
* To run the clients automatically along with the server:
```bash
# At node-1 (server)
./run-servers.sh
```
* If you do not want to run clients automatically from the server, delete the
2nd loop from `run-servers.sh`. Then:
```bash
# At node-1 (server)
./run-servers.sh
# At node-2 (client 0)
./run-machine.sh 0
# At node-i (client i - 2)
./run-machine.sh (i - 2)
```
* To kill the server processes, run `local-kill.sh` at the server machine. To kill the
client processes remotely, run `kill-remote.sh` at the server machine.
License
-------
Copyright 2014 Carnegie Mellon University
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
<!---
Algorithm details:
====
SERVER's ALGORITHM (one iteration)
1. Poll for a new request. The polling must be done on the last byte
of the request area slot. We must check (char) key != 0 and not just
key != 0. The latter can lead to a situation where the request is
detected before the key is written entirely by the HCA (for example,
only the first 4 bytes have been writtesn).
If no new request is found in FAIL_LIM tries, go to 2.
2. Move the pipeline forward and get a pipeline item as the return
value. The pipeline item contains the request type, the client
number from which this request was received, and the request area
slot (RAS) from which this request was received.
2.1. If the request type is a valid type (GET_TYPE or PUT_TYPE),
send a response to the client. Otherwise, do nothing.
3. Add the new request to the pipeline. The item that we're adding
is the one that was polled in step 1.
We zero out the polled field of the request and store it into the
pipeline item. This is a must do. Here's what happens if we don't
zero out the polled field. Although the client will not write
into the same request slot till we send a response for the slot, the
server's round-robin polling will detect this request again.
We also zero out the len field of the request. This is useful because
clients do not WRITE to the len field for GETs. So, when a new
request is detected in (1), len == 0 means that the request is a
GET, otherwise it's a PUT.
OUTSTANDING REQUESTS / RESPONSES:
----
The number of outstanding responses from a server is WS_SERVER.
A server polls for SEND completions once per WS_SERVER SENDs.
The number of outstanding requests from a client is WINDOW_SIZE.
A client polls for a RECV completion WINDOW_SIZE iterations after
a request was posted. The client polls for SEND completions *very*
rarely: once every S_DEPTH iterations. This is because the RECV
completions, which are polled frequently, give an indication of
SEND completions.
The client uses parameters CL_BTCH_SZ and CL_SEMI_BTCH_SZ to post
RECV batches.
--->
/**
* Copyright (c) 2010 Yahoo! Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you
* may not use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License. See accompanying
* LICENSE file.
*/
package com.yahoo.ycsb;
import java.util.HashMap;
import java.util.Properties;
import java.util.Set;
import java.util.Enumeration;
import java.util.Vector;
/**
* Basic DB that just prints out the requested operations, instead of doing them against a database.
*/
public class BasicDB extends DB
{
public static final String VERBOSE="basicdb.verbose";
public static final String VERBOSE_DEFAULT="true";
public static final String SIMULATE_DELAY="basicdb.simulatedelay";
public static final String SIMULATE_DELAY_DEFAULT="0";
boolean verbose;
int todelay;
public BasicDB()
{
todelay=0;
}
void delay()
{
if (todelay>0)
{
try
{
Thread.sleep((long)Utils.random().nextInt(todelay));
}
catch (InterruptedException e)
{
//do nothing
}
}
}
/**
* Initialize any state for this DB.
* Called once per DB instance; there is one DB instance per client thread.
*/
@SuppressWarnings("unchecked")
public void init()
{
verbose=Boolean.parseBoolean(getProperties().getProperty(VERBOSE, VERBOSE_DEFAULT));
todelay=Integer.parseInt(getProperties().getProperty(SIMULATE_DELAY, SIMULATE_DELAY_DEFAULT));
if (verbose)
{
System.out.println("***************** properties *****************");
Properties p=getProperties();
if (p!=null)
{
for (Enumeration e=p.propertyNames(); e.hasMoreElements(); )
{
String k=(String)e.nextElement();
System.out.println("\""+k+"\"=\""+p.getProperty(k)+"\"");
}
}
System.out.println("**********************************************");
}
}
/**
* Read a record from the database. Each field/value pair from the result will be stored in a HashMap.
*
* @param table The name of the table
* @param key The record key of the record to read.
* @param fields The list of fields to read, or null for all of them
* @param result A HashMap of field/value pairs for the result
* @return Zero on success, a non-zero error code on error
*/
public int read(String table, String key, Set<String> fields, HashMap<String,ByteIterator> result)
{
delay();
if (verbose)
{
System.out.print("READ "+table+" "+key+" [ ");
if (fields!=null)
{
for (String f : fields)
{
System.out.print(f+" ");
}
}
else
{
System.out.print("<all fields>");
}
System.out.println("]");
}
return 0;
}
/**
* Perform a range scan for a set of records in the database. Each field/value pair from the result will be stored in a HashMap.
*
* @param table The name of the table
* @param startkey The record key of the first record to read.
* @param recordcount The number of records to read
* @param fields The list of fields to read, or null for all of them
* @param result A Vector of HashMaps, where each HashMap is a set field/value pairs for one record
* @return Zero on success, a non-zero error code on error
*/
public int scan(String table, String startkey, int recordcount, Set<String> fields, Vector<HashMap<String,ByteIterator>> result)
{
delay();
if (verbose)
{
System.out.print("SCAN "+table+" "+startkey+" "+recordcount+" [ ");
if (fields!=null)
{
for (String f : fields)
{
System.out.print(f+" ");
}
}
else
{
System.out.print("<all fields>");
}
System.out.println("]");
}
return 0;
}
/**
* Update a record in the database. Any field/value pairs in the specified values HashMap will be written into the record with the specified
* record key, overwriting any existing values with the same field name.
*
* @param table The name of the table
* @param key The record key of the record to write.
* @param values A HashMap of field/value pairs to update in the record
* @return Zero on success, a non-zero error code on error
*/
public int update(String table, String key, HashMap<String,ByteIterator> values)
{
delay();
if (verbose)
{
System.out.print("UPDATE "+table+" "+key+" [ ");
if (values!=null)
{
for (String k : values.keySet())
{
System.out.print(k+"="+values.get(k)+" ");
}
}
System.out.println("]");
}
return 0;
}
/**
* Insert a record in the database. Any field/value pairs in the specified values HashMap will be written into the record with the specified
* record key.
*
* @param table The name of the table
* @param key The record key of the record to insert.
* @param values A HashMap of field/value pairs to insert in the record
* @return Zero on success, a non-zero error code on error
*/
public int insert(String table, String key, HashMap<String,ByteIterator> values)
{
delay();
if (verbose)
{
System.out.print("INSERT "+table+" "+key+" [ ");
if (values!=null)
{
for (String k : values.keySet())
{
System.out.print(k+"="+values.get(k)+" ");
}
}
System.out.println("]");
}
return 0;
}
/**
* Delete a record from the database.
*
* @param table The name of the table
* @param key The record key of the record to delete.
* @return Zero on success, a non-zero error code on error
*/
public int delete(String table, String key)
{
delay();
if (verbose)
{
System.out.println("DELETE "+table+" "+key);
}
return 0;
}
/**
* Short test of BasicDB
*/
/*
public static void main(String[] args)
{
BasicDB bdb=new BasicDB();
Properties p=new Properties();
p.setProperty("Sky","Blue");
p.setProperty("Ocean","Wet");
bdb.setProperties(p);
bdb.init();
HashMap<String,String> fields=new HashMap<String,String>();
fields.put("A","X");
fields.put("B","Y");
bdb.read("table","key",null,null);
bdb.insert("table","key",fields);
fields=new HashMap<String,String>();
fields.put("C","Z");
bdb.update("table","key",fields);
bdb.delete("table","key");
}*/
}
/**
* Copyright (c) 2010 Yahoo! Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you
* may not use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License. See accompanying
* LICENSE file.
*/
package com.yahoo.ycsb;
public class ByteArrayByteIterator extends ByteIterator {
byte[] str;
int off;
final int len;
public ByteArrayByteIterator(byte[] s) {
this.str = s;
this.off = 0;
this.len = s.length;
}
public ByteArrayByteIterator(byte[] s, int off, int len) {
this.str = s;
this.off = off;
this.len = off + len;
}
@Override
public boolean hasNext() {
return off < len;
}
@Override
public byte nextByte() {
byte ret = str[off];
off++;
return ret;
}
@Override
public long bytesLeft() {
return len - off;
}
}
/**
* Copyright (c) 2010 Yahoo! Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you
* may not use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License. See accompanying
* LICENSE file.
*/
package com.yahoo.ycsb;
import java.util.Iterator;
import java.util.ArrayList;
/**
* YCSB-specific buffer class. ByteIterators are designed to support
* efficient field generation, and to allow backend drivers that can stream
* fields (instead of materializing them in RAM) to do so.
* <p>
* YCSB originially used String objects to represent field values. This led to
* two performance issues.
* </p><p>
* First, it leads to unnecessary conversions between UTF-16 and UTF-8, both
* during field generation, and when passing data to byte-based backend
* drivers.
* </p><p>
* Second, Java strings are represented internally using UTF-16, and are
* built by appending to a growable array type (StringBuilder or
* StringBuffer), then calling a toString() method. This leads to a 4x memory
* overhead as field values are being built, which prevented YCSB from
* driving large object stores.
* </p>
* The StringByteIterator class contains a number of convenience methods for
* backend drivers that convert between Map&lt;String,String&gt; and
* Map&lt;String,ByteBuffer&gt;.
*
* @author sears
*/
public abstract class ByteIterator implements Iterator<Byte> {
@Override
public abstract boolean hasNext();
@Override
public Byte next() {
throw new UnsupportedOperationException();
//return nextByte();
}
public abstract byte nextByte();
/** @return byte offset immediately after the last valid byte */
public int nextBuf(byte[] buf, int buf_off) {
int sz = buf_off;
while(sz < buf.length && hasNext()) {
buf[sz] = nextByte();
sz++;
}
return sz;
}
public abstract long bytesLeft();
@Override
public void remove() {
throw new UnsupportedOperationException();
}
/** Consumes remaining contents of this object, and returns them as a string. */
public String toString() {
StringBuilder sb = new StringBuilder();
while(this.hasNext()) { sb.append((char)nextByte()); }
return sb.toString();
}
/** Consumes remaining contents of this object, and returns them as a byte array. */
public byte[] toArray() {
long left = bytesLeft();
if(left != (int)left) { throw new ArrayIndexOutOfBoundsException("Too much data to fit in one array!"); }
byte[] ret = new byte[(int)left];
int off = 0;
while(off < ret.length) {
off = nextBuf(ret, off);
}
return ret;
}
}
This diff is collapsed.
This diff is collapsed.
/**
* Copyright (c) 2010 Yahoo! Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you
* may not use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License. See accompanying
* LICENSE file.
*/
package com.yahoo.ycsb;
import java.util.HashMap;
import java.util.Properties;
import java.util.Set;
import java.util.Vector;
/**
* A layer for accessing a database to be benchmarked. Each thread in the client
* will be given its own instance of whatever DB class is to be used in the test.
* This class should be constructed using a no-argument constructor, so we can